Psychometrika 


VOLUME XXV-—1960 
JANUARY-DECEMBER 





Editorial Council 


Chairman:—HaroLp GULLIKSEN Managing Editor:— 
LyLe V. JoNEs 


Editors:—Dorotuy Apxins Woop Assistant Managing Editor:— 
Pau. Horst B. J. WINER 


Editorial Board 


R. L. ANDERSON Wh. K. Estes Quinn McNEmMarR 

T. W. ANDERSON Henry E. GARRETT GrorGeE A. MILLER 
Rotr BARGMANN Bert F. GREEN Wa. G. MoLuENKoPF 
Rosert R. BusH J. P. GuILForD Lincotn E. Moses 

J. B. CARROLL HaRoLp GULLIKSEN FREDERICK MOSTELLER 
H. 8. Conrap Paut Horst GEORGE E. NICHOLSON 
C. H. Coomss Lioyp G. HumpHREYs M. W. RicHARDSON 

L. J. CRONBACH Truman L. KELLEY R. L. THORNDIKE 

E. E. CurETon AuBErT K. Kurtz WarREN S. ToRGERSON 
Paut S. Dwyer FreEepDERIc M. Lorp LEDYARD TUCKER 
ALLEN EDWARDS Irvine LorcE D. F. Voraw, Jr. 

Max D. EncetHart R. Duncan Luce . Dorotuy ApKINs Woop 


Book Review Editor:—J oun B. CARROLL 





PUBLISHED QUARTERLY 


By THE PSYCHOMETRIC SOCIETY 
AT 1407 SHERWOOD AVENUE 
RICHMOND 5, VIRGINIA 














Psychometrika 


CONTENTS 


ULTIMATE CHOICE BETWEEN TWO ATTRACTIVE GOALS: 
PREDICTIONS FROM A MODEL .......... 1 
FREDERICK MOSTELLER AND MAuRICE TATSUOKA 








REMARKS ON TUCKER’S INTER-BATTERY METHOD OF 
PAGTORM ANAT Vie one ee ere OS eres 19 
W. A. GiBson 


MULTIDIMENSIONAL UNFOLDING: DETERMINING THE 
DIMENSIONALITY OF RANKED PREFERENCE DATA 27 
JosEPH F. BENNETT AND WILLIAM L. Hays 


A CRITERION FOR SELECTING VARIABLES IN A REGRES- 
NE SN a ek Se eg be ee 45 
H. LinHART 


SOME MULTIPLE CORRELATION AND PREDICTOR SELEC- 
De awe a Bw Ba 59 
Harry E. ANDERSON, JR. AND BENJAMIN FRUCHTER 


TWO-ALTERNATIVE LEARNING SITUATIONS WITH PAR- 
TIAL REINFORCEMENT ...........2... 77 
Mary I. HANANIA 


AN EMPIRICAL STUDY OF THE NORMALITY AND INDE- 
PENDENCE OF ERROR OF MEASUREMENT IN TEST 
Sg ee er eC a gig gt a ge gE 91 

FrRepDERIC M. Lorp 


BOOK REVIEWS 
Yry6 AHMAVAARA. On the Unified Factor Theory of Mind .... 105 
Review by RotF BARGMANN 
Y. AHMAVAARA AND T. MARKKANEN. The Unified Factor Model . 109 
Review by RotF BARGMANN 


Sotomon Ku.uBack. Information Theory and Statistics . .. . . 113 
Review by Witu1aAM J. McGiiu 








VOLUME TWENTY-FIVE MARCH 1960 NUMBER 1 











Psychometrika 








CONTENTS (Cont.) 


R. D. Luce anp H. Ratrra. Games and Decisions ....... 
Review by R. M. THRALL 


Raymonp B. Carrey. Personality and Motivation: Structure and 
UES nei oG eta Sy eet Gack ae Oke ee 
Review by Ernest C. TuPEs 


D. R. Cox. Planning of Experiments 


A. E. Maxwetu. Experimental Design in Psychology and the 
Ee rer a tat ae oe ee oe ee 
Review by Jack SAWYER 


ANNE Anastasl. Differential Psychology ........4.24.2+4 
Review by Joun E. MILHOLLAND * 


R. Ovavi VirraMAxi. Personality Traits Between Puberty and 
SN ig eee eg Re ee gb 
Review by J. A. Rapciirre 


D. H. Srorr. The Social Adjustment of Children. Manual to the 
Bristol Social Adjustment Guides . . ...... 2.4. 
Review by ANNE RoE 


ALPHONSE CHAPANIS. Research Techniques in Human Engineering 
Review by Leonarp C. MrEapD 


F. F. SrepHan Anp P. J. McCartuy. Sampling Opinions . 
Review by Ex1 8. Marks 


S. N. Roy. Some Aspects of Multivariate Analysis. ....... 
Review by J. E. Kerru Smita 


115 


117 


119 


120 


121 


122 








VOLUME TWENTY-FIVE MARCH 1960 NUMBER 1 











4 A ‘ 


C¢. PSYCHOMETRIKA—VOL. 25, NO. 1 





MARCH, 1960 


ULTIMATE CHOICE BETWEEN TWO ATTRACTIVE GOALS: 
PREDICTIONS FROM A MODEL* 


FREDERICK MostELLERt 
HARVARD UNIVERSITY 


AND 


MAvRIcE TATSUOKA 


UNIVERSITY OF HAWAII 


A mathematical model for:two-choice behavior in situations where both 
choices are desirable is discussed. According to the model, one or the other 
choice is ultimately preferred, and a functional equation is given for the frac- 
tion of the population ultimately preferring a given choice. The solution 
depends upon the learning rates and upon the initial probabilities of the 
choices. Several techniques for approximating the solution of this functional 
equation are described. One of these leads to an explicit formula that gives 
good accuracy. This solution can be generalized to the two-armed bandit 
problem with partial reinforcement in each arm, or the equivalent T-maze 
problem. Another suggests good ways to program the calculations for a high- 


speed computer. 


The immobility of Buridan’s ass, who starved to death between two 
haystacks, has always seemed unreasonable. No doubt the story was invented 
to mock an equilibrium theory of behavior. One expects that any such 
equilibrium in approach-approach situations will be unstable—one of the 
attractive goals will be chosen. In this paper some properties that flow from 
a mathematical model for repetitive approach-approach behavior are dis- 
cussed. In the model for behavior in these choice situations, an organism 
initially shifts its choices from one to another, but after a while settles upon 
a single choice. ; 

Thus in the early part of the learning the theoretical organism may give 
some expression to the notion of an equilibrium by making different choices 
on different trials, but eventually even this behavior vanishes for the single 

*Support for this research has been received from the National Science Foundation 

Grant Ne -G2258), the National Institute of Mental Health (Grant M-2293), and the 
aboratory of Social Relations, Harvard University. 

We wish to acknowledge and express our appreciation for the cooperation and assist- 
ance given by Phillip J. Rulon, Albert Beaton, Wai-Ching Ho, and Donald Spearritt, who 
set up, programmed, and executed numerous calculations connected with the linear equa- 
tions method of solution, and by Cleo Youtz for extensive calculations at every stage of the 
work, We also wish to thank Ray Twery and Robert R. Bush for permission to use in Table 
3 some of the unpublished results of their calculations. Those calculations were made on the 


Illiac through the cooperation of the Digital Computer Laboratory of the University of 
Illinois, Dr. John P. Nash, Director. 











2 PSYCHOMETRIKA 


organism. On the other hand, some organisms may ultimately choose one 
goal and others another, so that a notion of equilibrium or balance could be 
recaptured across a population of organisms. The quantitative aspects of a 
model for such behavior are investigated. The model employed is one dis- 
cussed by Bush and Mosteller [1]. 

A simple situation will be discussed first, then the mathematical problem 
encountered there will be related to the more complicated two-armed bandit 
problem with partial reinforcement on each arm. Suppose that on each 
trial of an infinite sequence an organism may respond (or choose) in one 
of two ways. For purposes of exposition, specify the ways as R and L (for 
right and left, say), so that for concreteness one can think of a rat choosing 
the left-hand or right-hand side in a T-maze, or a person choosing the left- 
hand or the right-hand button in a two-armed bandit situation. However, 
R and L are intended to stand for a general pair of attractive objects or 
responses, mutually exclusive and exhaustive, which lead to attractive 
goals. 

Suppose that on a given trial the probability of choosing R is p, and 
that of choosing L is 1 — p, where as usual 0 < p < 1. If R is chosen, then the 
probability of choosing RF next time is increased to a,;p + 1 — a, , but if L 
is chosen the probability of choosing R next is reduced to a .p, where 
0 <a, < 1,0 < a < 1. The point is that when a reinforcing choice is made, 
that choice has an increased probability of being chosen next time, and 
both R and L are regarded as reinforcing. The asymmetry in the formulas 
comes from the fact that the notation uses the probability of choosing R, 
and not the probability of choosing the particular side chosen on each trial. 
The operators used to change the probabilities are discussed by Bush and 
Mosteller ({1], p. 154 ff.). 

Suppose the organism continues making the choices and that his prob- 
abilities are adjusted after every trial according to the rules just given. Then 
it can be shown that sooner or later the organism stops making one of the 
choices and thereafter chooses only the other. An extreme example occurs 
if both a, and a, are zero—then the organism chooses forever what he chooses 
first (one-trial learning). 

One mathematical problem is to discover the probability that the organ- 
ism eventually chooses # rather than L all the time. If he does choose RF all the 
time, then he is said to be “ultimately attracted by R,” or FR is “ultimately 
attracting.’”’ The desired probability should be expressible as a function of 
the initial probability p and of the attractiveness coefficients a, and a, (the 
smaller an a, the more attractive the side). For convenience, this will be 
called the simple approach-approach problem, in contrast to the more compli- 
cated partial reinforcement problems. 

Consider now as an example a T-maze experiment with paradise fish 
described by Bush and Wilson [2]. On each trial of this experiment a fish 














FREDERICK MOSTELLER AND MAURICE TATSUOKA 3 


started at one end of a tank and swam to the other, where the left or right 
side could be chosen. When the right-hand side was chosen, the fish was 
rewarded on 75 percent of the trials. When the left side was chosen, the fish 
was rewarded on 25 percent of the trials. The operation was to place the reward 
on one side or the other every time. In one group a fish was able to see the 
reward through a transparent divider when he chose the unrewarded side. 
In the other group an opaque divider was used. The data from these groups 
showed that the fish tended to stabilize on one side or the other. 

Within the framework of the operators described earlier in this paper, 
if p is the probability of choosing the right-hand side on a given trial, and 
if the right-hand side is chosen and rewarded, the new probability of choosing 
the right-hand side might be expressed as ap + 1 — a. If the left-hand side 
were chosen and rewarded, the new probability of choosing the right might 
be reduced to ap. The parallel with the previous descriptions is very close. 

But suppose the side chosen is not rewarded. Then, essentially, three 
possibilities exist. 

(a) The side chosen is more likely to be chosen than it was before. The 
explanation might be, for example, that the organism is building up a habit 
pattern, or that he is secondarily reinforced for being in a place that earlier 
was rewarding. 

(b) The side chosen is less likely to be chosen than before. The ex- 
planation might be, for example, that information has been received that 
this side is not paying off. 

Whatever the explanation may be, the models corresponding to = (a) 
and to (b) make quite different predictions. The model for (a) says that the 
probability associated with the side chosen is always increased whether 
reward is given or not. This ultimately implies—for the operators described 
here—that one side is chosen every time, that is, that eventually the organism 
stabilizes on one side. On the other hand, the model for (b) would imply 
that the organism does not stabilize. To see this, suppose that an organism 
is certain (p = 1) to choose the right-hand side—that is, he has stabilized 
on the right. Then because of partial reinforcement the organism will ex- 
perience some nonrewarded trials on the right-hand side. These will reduce 
the probability of choosing the right-hand side, and so the left-hand side 
will be chosen sometimes. A similar argument shows that the organism 
cannot stabilize on the left. Thus under partial reinforcement, a model for 
assumption (b) would typically have asymptotic instability. A subject does 
not become attracted by one side or the other, nor does he finally acquire a 
fixed probability p of choosing FR. Instead, his value of p drifts up and down, 
though in a stochastically stable way. Thus model (a) has attracting and 
absorbing barriers, while model (b) has reflecting barriers. 

(c) The probability is unchanged by a nonreward—then everything 
depends upon the rewarded trials. 











4 PSYCHOMETRIKA 


In the experiment with paradise fish the data suggest model (a). In 
this paper we shall deal with the type (a) model. On the basis of the model, 
we would like to know (in terms of the learning rates, the initial probabilities, 
and the probabilities of reward on the two sides) what fraction of the organisms 
will stabilize on a given side. 

Because the numerical problem has turned out to be sathor trouble- 
some, and because the general problem has some interest as shown by previous 
work, we will sketch various solutions that have been tried. Each of them 
is time-consuming in its development and testing, so a research worker will 
want to know what ground has already been plowed. 


Previous Work 


To facilitate discussion of previous work on the simple approach-approach 
problem, a functional equation for the probability that an organism is 
ultimately attracted to R will be derived. Let f(p, ; a: , a2) be the probability 
that an infinite sequence of trials ends in choices of R. Here, p, is the initial 
probability of choosing R. The transition rules are: if p, is the probability 
of R on trial n, then the probability of R on the next trial is 


a) pa = i +1-—a,, if R_ is chosen on trial n, . 
= 


Asn if ZL is chosen on trial n. 


In the sequel there is usually no advantage in referring to the trial number 
associated with p, so the subscript on p, is dropped and p stands for the 
initial probability. Similarly it is always to be understood that the desired 
function f depends upon a, and a, ; so except when the full notation is needed, 
the notation f(p) will be used. 

The quantity f(p) may be composed of two parts—the parts corre- 
sponding to the choice of RF or of LZ on the initial trial. Assume that each 
member of a large population has the same initial probability p of choosing 
R and is faced with the same simple approach-approach problem. Then, on 
the first choice the fraction p of the individuals choose RF, and the new prob- 
ability of R is a,p + 1 — a, for any member of this group. This means 
that in this group, the probability of being ultimately attracted by FR is 
flap + 1 — a). Consequently this group contributes the portion 
p f(a:p + 1 — a) to f(p). In the same manner those organisms choosing L 
first contribute (1 — p) f(a2p) to f(p). Thus one derives the basic functional 
equation for the simple approach-approach problem: 


(2) fp) = pflap + 1 — a) + (1 — p)fla:p). 


The boundary conditions are {(0) = 0 and f(1) = 1. These conditions hold 
because if p = 0, then L occurs, and the new probability for R is a,-0 = 0. 














FREDERICK MOSTELLER AND MAURICE TATSUOKA 5 


Therefore L is always chosen. Similarly if p = 1, then R occurs, and the new 
probability for R is a,-1 + 1 — a, = 1. Therefore R is always chosen. 
Thus f(0) = 0 and f(1) = 1. These conditions for the function are needed 
because without them (2) only determines f to within a linear transformation. 
Thus if a certain f satisfies (2), direct substitution shows that Af + B also 
satisfies it (A and B are constants). 

Equation (2) could have had four parts if we related the desired prob- 
ability to the four terms occurring after two trials, or more generally 2” 
terms after n trials. These equations are all equivalent, but they can all be 
derived by successive applications of (2) to the f’s appearing on the right- 
hand side. 

The properties of f(p) have been studied before by Bellman and by 
Shapiro (({3], Parts II and III), and by Karlin [4] (c.f. [1], p. 163-4). Since 
not all of their results are readily accessible, those properties of f(p) especially 
useful here are given below. 

i. Nature of the solution. Equation (2) has a unique, monotone, analytic 
solution once the boundary conditions are given. With our boundary con- 
ditions the solution is convex for a, > a2 , concave for a, < a, . The mono- 
tonicity is consistent with the probability interpretation given by the learn- 
ing model—for given a, and a, , the larger the probability of choosing R 
initially, the more likely that FR is ultimately attracting. 

ii. Solutions under special conditions. In what follows, suppose the 
relevant boundary conditions f(0) = 0 and f(1) = 1 to hold. The special 
conditions have to do with the values assumed by one or both of the a’s. 

(a) a, = a, # 1. The solution is f(p) = p, as implied by the fact that 
f(p) is both convex and concave and by the boundary conditions. 

(b) a, = a = 1. The function f is not defined in our problem unless 
p = 1 or O, because the probability of R never changes and no attraction 
occurs. 

(c) a, = 1, a, ¥ 1. The occurrence of R leaves the probability of R 
unchanged because a,p + 1 — a, = p, so the process can only move toward 
choosing more L’s unless p = 1. Thus f(p;.1, a.) = 0, a, ¥ 1, p ¥ 1, and 
f(1; 1, 2) = 1. 

(d) ag = 1, a, ¥ 1. Similarly f(p; a, , 1) = 1, a ¥ 1, p # O, 
and f(0; a, ,1) = 0. 

(e) a, = 0. Here, the only way to be ultimately attracted to L is always 
to choose L. The probability of the latter behavior is 


© - seed) = 6-90 ant ~ des = TA ep. 


i=0 


Therefore the probability of ultimate attraction by R is 
(4) f(p; 0, 2) =1— (Pp, @2). 











6 PSYCHOMETRIKA 


(f) a. = 0. Here to be ultimately attracted by R is never to choose L. 
In this case 
fp; a, ,0) = plap +1 — aJ[a(ap +1—m) +1—-—a]-:: 
= plap +1 — aJlaip + 1 — ai] --- 


(5) 
11-1 —p)[l —a(1 — p)[l — ail — p)]--- 


lI 


I] [1 — ai(1 — p)]. 
In the second step above, note that if R occurs on the first n trials, the prob- 
ability of R is afp + 1 — af (proved in [1], p. 59). 

iii. Iterative properties. Any continuous initial approximation to f(p) 
can be iterated successively to obtain in the limit the function f(p). That is, 
suppose f(p) is a first guess at the function f(p), then a better approximation 
is given by the first iterate 


filp) = pfoloap + 1 — a) + (1 — p)folarp). 


For example if f,.(p) = p, then fi(p) = p + (a2 — a) p(l — p). 
More generally, the (n + 1)st iterate is given by 


fn+i(P) a Pfn(ap + 1 a) a) + (1 si: D)fn(aep) « 


Certain initial approximations lead to a monotonic sequence of iterates. 
(a) If fo(p) = p, the successive iterates monotonically increase toward 
f(p) if ag > a, , monotonically decrease toward f(p) if a, < a . 
(b) If for the beginning approximation 


I] [1 pix: ai(1 “ais p)|; for Qe < a1; 
(6) fp) = 4°~° ‘ 
1—[]{1-—ap], for wa >a, 
i=0 
the iterates increase (decrease) monotonically to the function. These results 
provide two sequences of bounds for {(p) when the approximations mentioned 
in (a) and (b) are used. 

The iteration procedure converges geometrically, that is, after n itera- 
tions one can be sure that the nth iterate f,(p) deviates from the correct 
answer f(p) by no more than Ap”, where A > 0, and 0 < p < 1. Though 
geometric convergence sounds speedy, if p were near 1, say 0.96, it would 
take more than 50 iterations to assure being within 0.1A. The details needed 
for the calculation of A and p will not be provided. 

These important results provide a starting point for studying the func- 
tion f(p), but they do not yield numbers or expressions whose values are 














FREDERICK MOSTELLER AND MAURICE TATSUOKA 7 


close to the true ones. In the remainder of this paper, several techniques for 
approximating f(p), are provided. 

A method designed for high-speed calculation will be considered first, 
then an excellent approximation obtained from a differential equation will 
be considered, and then that result will be extended to the two-armed bandit 
problem. Finally, brief mention of some other methods of approximating 
this functional equation will be given. 


Approximation by Simultaneous Equations 


Consider a grid of numbers 0 (= po), DP: , D2, *** » Pn» 1 (= Pasi) in 
the unit interval, and write the functional equation (2) as it applies to each 
of these values of the independent variables. (Lest confusion with earlier 
notation develop note that p; still refers to probabilities, but the subscripts 
no longer correspond to trials as they did in earlier sections.) Then one has 


the set of equations 


f0) =0 + f@), 
f(p.) = piflarp, + 1 — a) + (1 — p,)flasp,), 
(7) f(p2) = pof(exp2 + 1 — a) + (1 — pra)flarpr), 


S(pn) = Paf(arpn + 1 — a) + (1 — Dp)flerpr), 
fa) = f() +0. 


The first and last members of this set of equations are, of course, tautologies; 
there are only n nontrivial equations. 

The right-hand sides of the n nontrivial equations of the set (7) each 
involves the values of f(p) at points that do not ordinarily coincide with any 
of the chosen grid points. However, by using an interpolation formula, both 
f(aip; + 1 — a) and f(a.p;), i = 1, 2, --- , nm, may be approximated by 
linear combinations of the values of f(p) at two or more consecutive grid 
points p; , Pjs1 , *** . The number of grid points required depends upon 
whether one uses linear interpolation (two grid points), interpolation with 
second differences (three points), third differences (four points), and so forth. 

Whatever the number of points may be, each equation of the set (7) 
can be replaced by an approximate equality involving as unknowns just the 
values of f(p) at several predetermined grid points, and these unknowns 
occur only linearly. Thus a system of n linear equations is obtained, approxi- 
mately satisfied by the n unknown quantities, f(p.), f(p2), «+: , f(p,). The 
idea of deriving a system of linear equations whose roots approximate f(p,), 
a = 1, 2, -+- , n, was first suggested to us by J. Arthur Greenwood in an 
unpublished memorandum, in which linear interpolation was used to approxi- 


mate f(a:p; + 1 — a) and f(asp,). 


ll 








PSYCHOMETRIKA 


In this and in the following sections a standard numerical example in 
which a, = .75, a. = .80 is used to illustrate the various methods. This 
example has the advantage of being easily displayed; further, numbers are 
fairly easy to compute from it. It has the disadvantage of being relatively 
easy to fit, so the reader should not be misled into thinking that the precision 
attained for it is always obtainable. 

Example. The method just described is illustrated for a grid of five 
equally spaced points, using the standard example, a, = 0.75, a. = 0.80. 
Here, the functional equation is 


(8) f(p) = pf(O0.75p + 0.25) + (1 — p)f(0.80p). 
Taking p, = 0.25, p. = 0.50, ps = 0.75 and writing f(p,) = f; , for 
short, in accordance with equations (7), 
fi: = 0.25f(0.4375) + 0.75f(0.20), 
(9) f. = 0.50f(0.6250) + 0.50f(0.40), 
fs = 0.75f(0.8125) + 0.25f(0.60). 


F irst, linear interpolation will be used to approximate {(0.4375), {(0.20), 
{(0.6250), etc., by means of linear combinations of the five f’s: fo(= 0), 
fi, fe, fs and f,.(= 1). Thus, 


0.5000 — 0.4875 , , 0.4875 — 0.2500 , 
0.2500 : 0.2500 ‘ 


= 0.25f, + 0.75f. , 


0.25 — 0.20 0.20 — 0 
{(0.20) ~ “hoe — fo + “Qos fh 


0.25 
= 0.80}; , 








{(0.4375) =~ 


and, similarly, 
{(0.6250) ~ 0.50f, + 0.50f; , 
{(0.40) ~0.40f, + 0.60f, , 
{(0.8125) ~ 0.75f, + 0.25f, = 0.75f, + 0.25, 
{(0.60) ~0.60f, + 0.40f, . 


Substituting these approximate expressions for the several functional values 
in the right-hand sides of (9) and collecting all terms involving the unknowns 
into the left-hand sides, one obtains 


0.3375f, — 0.1875f, = 0, 


(10) —0.2000f, + 0.4500f, — 0.2500f, ~ 0, | 
— 0.1500f, + 0.3375f, ~~ 0.1875. 

















FREDERICK MOSTELLER AND MAURICE TATSUOKA 9 


Replacing the ~ by = in the set of approximations (10) and solving 
the resulting equations, one obtains the following approximations to f; . 
(The best available values are also shown for comparison.) 


Di f; (approx.) best values 
0.25 0.3385 0.4495 

.50 0.6093 0.7286 

.75 0.8276 0.8987 


The agreement with the best available values is only fair. 

Now use second-order interpolation for approximating the non-grid- 
point values of f(p) that occur in the right-hand sides of (9). The general 
formula (with equally spaced grid points) is 


fa, +x E -~3% (3 - *)h. + (2 _ fee 


€ € 
~4f(1- Sl 


where x = 2;,, — x; . Note that (11) gives the interpolated value as a weighted 
average of the three adjacent tabled values instead of using differences. 

Applying (11) to the problem at hand and substituting these approxi- 
mate expressions into the right-hand sides of (19), one obtains the following 
system of approximations. 


0.2410f, — 0.17447, + 0.0235f; = 0, 


(11) 


(12) —0.1400f, + 0.3925f. — 0.3150f; = —0.0625, 
— 0.0497f. + 0.1369f,; = 0.0872, 
whose roots yield the following approximations. 
Di f; (approx.) best values 
25 0.4279 0.4495 
.50 0.7122 0.7286 
75 0.8955 0.8986 


These results are a definite improvement over those obtained by linear 
interpolation. 

The above example seems to indicate that a considerable improvement 
of the approximation can be expected when higher differences are used in the 
interpolation formula for expressing the non-grid-point values of f(p) in 
terms of the grid-point values. However, the interpolation formulas become 
more and more cumbersome to work with numerically as higher differences 
are included. It therefore is pertinent to see how much improvement can 
be gained by increasing the number of grid points alone. 











10 PSYCHOMETRIKA 


TABLE 1 


Improvement Obtained by Increasing the Number of 
Points in Grid, using Linear Interpolation only; 
Entries are Approximate Values of f. 








Number of points 








Py 4 5 6 11 21 Best value 
-10 - 1347 .1473 1573 - 1864 1984 -2055 
-25 -3147 - 3388 +3557 +4152 -4375 4495 
-50 - 5690 -5955 -6129 +6872 -7133 - 7286 
-75 - 7845 - 8007 -8153 - 8666 - 8856 -8987 
-90 -9138 -9203 -9261 -9476 - 9586 - 9658 





Using only linear interpolation, approximations from grids of 4, 5, 6, 
11, and 21 points were obtained. These points were not equally spaced because 
it was hoped that better results would be obtained by spacing the grid so 
that the functional values would be approximately equally spaced. Infor- 
mation needed for such spacing was available from other methods described 
later. 

Linear interpolations were made in the results for the five grids de- 
scribed above to obtain approximate values at p = 0.10, 0.25, 0.50, 0.75, 
0.90. The numbers are shown in Table 1, together with the best known 
values. 

Using the difference between the best value and the cell entry for a given 
p; as a measure of error, it will be noted that, very roughly, the error decreases 
linearly with the spacing. On the other hand, with a five-point grid, changing 
from linear to second-order interpolation gives improvement roughly equiv- 
alent to that given by increasing the number of points to 21 and using linear 
interpolation only. Since simultaneous equations are expensive to solve, it 
appears that second-order interpolation is well worth the effort, contrary to 
usual advice. 

Calculations, with the aid of an electronic computer, using 21 grid 
points and second-difference interpolation as well as third-difference inter- 
polation have been made. The results are summarized in Table 2. The results 
obtained by using second-order differences are hardly distinguishable from 
those using third-order differences, though in a more sharply curved example 
they could be more useful. The third-order interpolation column provided 
numbers labeled ‘‘best values” throughout this paper. 

In principle, any desired degree of accuracy can be attained by using 
finer grids, but the cost of the calculations increases roughly as the square 
of the number of grid points used. A high-speed computer could be 
programmed to write its own equations and solve them, but such a program 











FREDERICK MOSTELLER AND MAURICE TATSUOKA 


Approximations Using Second- 


TABLE 2 


and Third-Order Interpolations 


With 21 Grid Points and the Approximation 


By Second Order Differential Equation 











Pp £. (second-order f,. (third-order f (differential 
: interpolation) interpolation) equation) 
-00 -00000 -00000 -00000 
05 -10718 -10778 10325 
-10 20495 -20547 -19839 
+15 29407 -29455 - 28601 
+20 - 37528 -37564 + 36648 
225 -44919 ~44947 -44035 
+30 51637 -51620 -50774 
-35 -57739 -57736 -56982 
40 ~63279 -63277 -62626 
245 - 68304 -68305 -67764 
-50 -72858 -72859 -72435 
+55 - 76981 - 76983 - 76672 
-60 -80711 -80713 -80504 
-65 -84082 -84084 83967 
70 -87126 87127 -87088 
75 -89873 89874 -89891 
-80 +92349 92350 92402 
-85 -94578 94579 94648 
+90 -96584 96584 -96648 
95 -98387 -98387 -98425 
1.00 1.00000 1.00000 1.00000 


11 








was not written. If good accuracy is required, the techniques proposed in 
this section are recommended. 


Approximation by a Differential Equation 


An essential feature of the simultaneous-equations approximation 
discussed in the preceding section was the replacement of non-grid-point 
values of f(p) by linear combinations of grid-point values. The continuous 
variable analogue of this procedure is the expansion of f(a,p + 1 — a) and 
f(a2p) as Taylor’s series in the neighborhood of p. This approach will now be 
used to derive a differential equation whose solution yields an approximation 
to the desired function, f(p). 

Rewriting f(a,p + 1 — a) as f(p + (1 — a) (1 — p)), and expanding 
the latter as a Taylor’s series, 


fp + (1 — a)(1 —.p)) = f(p) + (1 — a)(1 — p)f'(p) 
(13) fe (1 = art — p) f’'(p) eee ‘ 














12 PSYCHOMETRIKA 


where f’ and f” are the first and second derivatives of f with respect to p. 
Similarly, expand f(a2p) as follows: 


flasp) = f(p — (1 — as)p) = f(p) — (1 — az)pf’(p) 
(14) - (1 =) p f’"(p) eek oe : 
Using only through the term in f’’(p) in the two series (13) and (14), 


substitute these expressions for the functions in the right-hand side of the 
functional equation (2). The result is a differential equation 


fp) = pif(p) + 1 — a:)(1 — p)f'(p) + 41 — a)*(1 — p)’f’"(p)] 

+ (1 — p)f(p) — (1 — as)pf'(p) + 3(1 — a2)’p’f’"(p)]. 
By rearranging terms in (15), 
${(1 — a1)” — [(1 — a)? — (1 — a)*Ip} fp) + (a2 — a) f’(p) = 0. 


Hence, 


(15) 








f’’(p) a2 2(a2 — a) 
sag {@ “-«)-0-—alb—-a—a)’’ 
which is integrated to yield 
17 (p) = of G-e) aaa 
P MO = “laa =a Pl? 


where C;, is a constant of integration, and @ is an abbreviation for (a, + a2)/2. 
Integrating both sides of (17), 


— a, 2 14+1/(1—&) 
(18) f(p) = el g eee ~ met i ats Qe)” ie P| + C, ’ 


where C/ and C, are new constants of integration. 
Determining C{ and C, from the boundary conditions f(0) = 0 and 
f(1) = 1, the final form of f(p) is 











_ &_ iA 
(19) abate wag 
where 
A (1 ae a)” 


~ (lL — a)" — (1 — a)" 
and 


xf 1 
le (a, + a)/2 





B + 1. 




















FREDERICK MOSTELLER AND MAURICE TATSUOKA 13 


Example: Taking a, = 0.75, a2 = 0.80, as before, calculate the constants 
occurring in (19). 





0.25" 

A = O25 — O20 ~ 27788, 
1 

B= 7753p +1 = 5M. 


Hence, from (19), 


_ 260.42 — (2.7778 — p)’ "| 
@20) a? 237.49 


Using (20), calculate the values of f(p) for p = 0.25, 0.50, and 0.75. 








Di f; (approx.) best values 
.25 0.4403 0.4495 
.50 0.7244 0.7286 
By £55 0.8989 0.8987 


Values of f(p) in intervals of 0.05 for p are shown in Table 2, where they may 
be compared with the best values so far obtained. Among the various approxi- 
mate methods which can be easily carried out with desk calculators, the 
differential equation method yields results in closest agreement with those 
obtained by the simultaneous equations using 21 grid points and third- 
difference interpolation. 


The Two-Armed Bandit 


The differential equation approach can equally easily be applied to the 
more general model appropriate to the two-armed bandit problem with 
partial reinforcement on each arm (or the equivalent T-maze experiment). 

Suppose that there are two responses R and L, and whichever occurs 
a reward or a nonreward follows. If R occurs, reward follows with prob- 
ability 2, ; if Z occurs, reward follows with probability 7, . If p is the prob- 
ability of R on a given trial, the new probability for FR is as follows. 


New probability Probability 
for R of happening 
ap +1-—a, if R and reward occur ip 
ap + 1— a, if R and nonreward (1 — m,)p 
ap if Z and reward (1 — p) 
Oop if L and nonreward (1 — m2) (1 — p) 


These results represent a special case of those presented in ({1], p. 118, 286) 
and discussed briefly on p. 287 in the paragraph following equation (13.22) 
in [1]. 





PSYCHOMETRIKA 


It has been assumed that reward is equally effective on either side and 
that nonreward is also equally effective on either side. It should be recalled 
that these transition rules imply that nonreward improves the probability 
of choosing a given side, as discussed in the opening section of this paper. 

Now in the same way that the basic functional equation (2) for the 
simple approach-approach problem was derived, the basic functional equa- 
tion for the two-armed bandit problem with partial reinforeement can be 
derived. The functional equation for the proportion f(p) of organisms who 
eventually learn to make only response R is 


f(p) = prflap + 1 — a) + p(l — m,)fl(azp + 1 — a) 


(21) 
+ (1 — p)rof(ap) + (1 — p)(1 — m2)flarp). 


No generality is lost, and there is some gain in the sequel, if it is assumed 
that a, < a, and 7, > m,.If z, = land z, = 0, (21) reduces to (2). 


Using the approximations (13) and (14) for f(a;p + 1 — a,) and f(a,p), 
respectively, (21) can be rewritten, after rearrangement of terms, as 


[(r, — m2) {(1 — a)” ~{- a2)" }p ~~ fedl — a)” 
(22) + (1 — m)(1 — a)*} 1") = Gr — :)(a2 — )f"). 
The boundary conditions are f(0) = 0, f(1) = 1, as before. 


Comparing (22) with the corresponding differential equation, (16), 
for the simpler model, the general solution of (22) has the form 


A® —(A — p)’ 
A® —(A = 1)"’ 





(23) f(p) = 


where the constant A is now defined as 


m(1 — a)’ + (1 — m)(1 = a)" 
= (7 ac m2) ((1 ae a)” ene (1 a as)" ] ’ (a; F ao »™ gt >) 





A= 


while 
- , +1 
1 — (a, + a,)/2 ? 


as before. Note that the expression for A for the simple approach-approach 
problem is obtained by substituting 7, = 1, 7. = 0 in the present A. 

The expression for A is undefined when either a, = a, Or m, = ™ , 
hence (23) cannot be used. In each of these cases, however, it can be argued 
from first principles that the function sought is f(p) = p. This result is also 
given by the differential equation (22), which reduces to f’’(p) = O under 
these special conditions. 





B 





FREDERICK MOSTELLER AND MAURICE TATSUOKA 


Monte Carlo Calculations for Two-Armed Bandits 


Twery and Bush made a series of Monte Carlo calculations on Illiac 
of {(0.50) for two-armed bandit experiments with 7, = 0.75, 7. = 0.25 for 
various combinations of a-values. The case of a, = 0.90, a. = 0.95 will be 
used to calculate the value of f(0.50) from (23). 

For the stated parameter values, 


4 = (0:75)(0.10)? + (0.25)(0.05)" 
= (0.50) [(0.10)? — (0.05)"] 





= 2.1667, 


1 
= 7— (1.8572) 7 


B 1 = 14.3333. 


Hence, (23) in this case becomes 


_ 65015.7 — (2.1667 — p)*”?, 
see 4 geas 65006.6 





From this formula, 
{(0.50) = 0.977, 


compared with Twery and Bush’s result, 0.970. 
The values of f(0.50), calculated from (23) for the various combinations 
of alpha values used by Twery and Bush, are shown in Table 3 along with 


TABLE 3 


Comparison of Differential Equation Results (first entry) 
With Those of Twery and Bush (second entry) 
Obtained from the Mean Probability Level of 100 Sequences 


At the 800th Trial for Various a> aos for p= 0.5 
And wm = 1-7, = 0.75 








-93 
























































16 PSYCHOMETRIKA 


the Monte Carlo result obtained by these authors. Their numbers were 
obtained in a pseudo-experiment in which 100 sequences of 800 trials each 
were run with random numbers. The entry itself is the average value of p 
for the 100 sequences at trial 800. Thus it has some random variation and 
is pre-asymptotic to the extent that 800 trials is not an infinite number. 
The agreement is quite encouraging for the use of the differential-equation 
method. The agreement between the Monte Carlo results and the differential 
equation is surprisingly close, considering that only 100 sequences were used 
and that the differential equation is only an approximation. On the other 
hand, ‘both learning parameters are near unity in these examples; in that 
neighborhood the differential equation should be quite a good approximation. 


T-maze Experiment with Paradise Fish 


In the first section of this paper, a T-maze experiment by Bush and 
Wilson [2] using paradise fish was described. The rate of reward was 0.75 
for response F and 0.25 for response L. In the notation of our model, 7, = 0.75 
and 7, = 0.25. The learning-rate parameters were estimated to be a, = 0.916 
and a, = 0.942 for the group in which the fish could see the reward through 
a transparent divider when they chose the unrewarded side. The initial 
probability for response R (estimated from results on the first 10 of the 140 
trials) varied considerably from one fish to another, the average value being 
0.496, or nearly 0.50. Bush and Wilson report that the initial distribution 
of p approximately followed the symmetrical Beta distribution 


(25) y = 3.61[p(1 — p)]”’. 


This initial distribution was used to calculate the expected fraction 
attracted by FR. The relative areas under the curve (25) in the ten intervals 


(0, 0.1], [0.1, 0.2], ---, [0.9, 1.0] 


were found, the values of f(p) at the midpoints of these intervals were calcu- 
lated, and their weighted average was obtained. The result was f(p) = 0.800. 

In the experiment, Bush and Wilson found 15 of the 22 fish in the ex- 
perimental group making nearly all R responses after about 100 trials. This 
leads to the estimate 0.68 for the proportion ultimately attracted to the R 
response. That result is only about one standard error away from the fitted 
value 0.80. That small deviation does not even take any account of the 
unreliability of the original estimates of the a’s. 


Other Methods 


Several other methods of approximating the function have been explored. 
One that was rather successful employed the function f(p; a, 0) or 
f(1 — p; 0, a), choosing a value of a that made the iterate change very 














FREDERICK MOSTELLER AND MAURICE TATSUOKA 17 


little. This method was superior to an iteration technique beginning with 
fo(p) = p. 

Since one knows exactly the solution to the functional equation in the 
special case a, = a, , the notion of expanding f(p; a; , a2) as a power series 
in a in the neighborhood of a, suggests itself. Robert R. Bush, in an un- 
published note, developed such a technique. 


REFERENCES 


[1] Bush, R. R. and Mosteller, F. Stochastic models for learning. New York: Wiley, 1955. 

[2] Bush, R. R. and Wilson, T. R. Two-choice behavior of paradise fish. J. exp. Psychol., 
1956, 51, 315-322. 

[3] Harris, T. E., Bellman, R., and Shapiro, H. N. Studies in functional equations occurring 
in decision processes. Res. Memo. P-382, The RAND Corp., Santa Monica, Calif., 1953. 

[4] Karlin, S. Some random walks arising in learning models I. Pacific J. Math., 1953, 3, 
725-756. 


Manuscript received 1/9/59 
Revised manuscript received 6/29/59 


1 
! 
\ 
i 


5 AUT a id Meee THis) OTN ESTO CIMA eo mnie IRM Dis ake 


ALAS WENA tai ALP in TEN IMI A 











PSYCHOMETRIKA—VOL. 25, NO. 1 
MARCH, 1960 


REMARKS ON TUCKER’S INTER-BATTERY 
METHOD OF FACTOR ANALYSIS 


W. A. GIBson 


PERSONNEL RESEARCH BRANCH 
THE ADJUTANT GENERAL’S OFFICE* 


Tucker’s inter-battery method of factor analysis is shown to assume 
implicitly that, within the factor space of overlap between the vector config- 
urations of the two test batteries involved, the locations of the two sets of 
principal axes and the sizes of the associated characteristic roots are identical. 
A suggested modification of the inter-battery method to avoid these restric- 


tions is considered. 


Tucker has presented an inter-battery method of factor analysis that 
“depends on a finding that factor matrices on reference factors can be de- 
termined for the two batteries from just the matrix of correlations R,. 
between the two batteries --- It is to be noted that only the factors common 
to the two batteries are obtained and not factors that are represented in 
only one of the two batteries” ({1], p. 113). The fundamental factor theorem 


R = FF’ is taken as the starting point of the development, but no additional 
conditions are stated for the obtaining of the orthogonal factor matrix of 
each battery. The purpose of this note is to show that additional constraints 
are involved in the inter-battery method, and to demonstrate some difficulties 
that can arise when these conditions are not met. The latter objective will 
be undertaken first with the help of a fictitious example. The former will 
then be shown in terms of certain algebraic and geometric considerations. 
Finally a suggestion by a prepublication reviewer as to how the inter-battery 
method might be modified in order to avoid these constraints will be discussed. 

The fictitious two-dimensional example is pictured in Figure 1, and the 
corresponding intercorrelations (R) and an orthogonal factor matrix (F,) 
are shown in Table 1. Tests 1-4 form battery 1, and tests 5-8 form battery 2. 
The intercorrelations for battery 1 are R,, , the intercorrelations for battery 
2 are R,. , and the correlations between battery 1 and battery 2 are R,. or 
R,, , one being the transpose of the other. The orthogonal factorization F, , 
containing only real numbers as elements, reproduces perfectly (except for 
rounding error) not only R,, and R,, , but also R,, and R,. . The vector 
configurations corresponding to the two test batteries span the same real 


*The opinions expressed are those of the author and are not to be construed as 
reflecting official Department of Army policy. 


19 











20 PSYCHOMETRIKA 


> 
a 


78 


5,6 








Fiaure 1 
Vector Configuration for a Fictitious Two-Factor Example 


two-space, and the occurrence of the test vectors of each battery in identical 
pairs rules out any possibility that the problem of estimation of diagonal 
elements can be held responsible for some difficulties that will arise later. 
When two columns, j and k, of a correlation matrix are identical, the corre- 
sponding diagonals for the common factor space must be identical and 
equal to the correlation between tests j and k. 

The inter-battery method may be applied by forming two symmetric 
matrices, 


(1) H, = R,2Rip 
and 
(2) H, = RiRy: 


({1], eq. 6.1 and 6.2) and obtaining their characteristic roots and vectors. 
These are all shown in Table 2. The 4 X 2 matrices W, and W, give, re- 
spectively, the characteristic vectors of H, and H, corresponding to their 
non-vanishing characteristic roots. The non-vanishing characteristic roots 
of H, and H, are identical, and these appear in the diagonal cells of the 
2 X 2 diagonal matrix y’ in Table 2. In the diagonal cells of the 2 X 2 diagonal 
matrix y of Table 2 appear the square roots of the corresponding diagonal 
cells in y’. The first diagonal element in y must be taken as the negative 
square root of the corresponding element in 7’ in order to satisfy 


{3) Riz = A,A; 


({1], eq. 5). In the diagonal cells of the 2 X 2 diagonal matrix 7 of Table 2 
are shown the square roots of the corresponding diagonal elements in 7. 
The first diagonal element in 7’ is imaginary because it is the square root of a 
negative number. 

















lol’t LOL*tT Lg*- LE- BQ let’ Let’ Lgr- Lor- n 
LoL’t LOL'tT L9S*- LOS- L let? let’ lg9st- Loer- ¢ 
lg9s*- lgse- Let Let 9 lgS*- Llo9S*- LoL*t LOL*T 2 
Lg*- Lose- let Let c lgS*- LoS*- Lol't LOL*T T 

8 L 9 S 4 € 2 T 

iv ay ad ey Vv W si Ty 

lol*- Llol’- Ll Lg g cSn° = Gn* LG Lgs° 4 
lol’- lol:- 196° 196° L €Sn° SG" LG” Lgs- € 
lgs° 1g9s° €Sn* EG" 9 lgs* lose = Lol*-_~—=s LOL’ 4 
19s° Lgs° €Gy° Gy” SG lg9s* Loss — ss Lol*- = LOL*- T 

8 L 9 S t € z 3 

2% WW 











aTdurexy 210Z9B{-OA], SNOTITIOTA B JO GTOJVOU™ eouSTeZey A£10498g-1970T 


am £q (%y poe yy suorqetarzoo1e30r A10330G JO 4A pe UOTZONpoIdoy 























69T°T 000° II 99€°T 00° Ir ¢9g°T 000° Ir 

000° reoe*t oi oooe) 0s Glg*t- =I ooo §6=sLTG*e:— sda 

II I II I Ir I 

¢ “ 

ees Fler Loe SG" 8 On9°T OnO'T gge- gee- 8g 
ote’ FL6R* L loz" S69" 4 On9°T OnO°T gge- gee’- OL 
gol’ t9K"- 9 go9°  loz- 9 gez"-  ggz- ToO'T TSO'T 9 
gol" F99g"- Ss gc9° Low*- 4 ege°-  gez’- TSO'T TSOT 4 
II I II I 8 L 9 4 

x “n= v eu ay Ay 2 

g99L° t99K"- a9 4loe-) TS*T TOOT gee- gee- 4 
goL°  t99E"- Gy lpe- ¢ TOO'T TSO'T gge- gee- C€ 
zee FLOR" z lige" SG9° z 982"-  - BBS"- «= ONT) «= ONO’ 2 
ze’ FL6R* T loz" SG9° T e82°-  RS"- «ONT «ONT OCS 
II I II I * € z T 
tae Th %y ys Gy 








eTdwexy 10ZIAI=CAL SHOTZTIOTA VW JOJ s1OQoOVZ soUsTEZey Ar1079QUG-197,U. JO UOT eUTMIEzEg 

















€ FIavi 
2 F1avL 
lol’ Lol*- g 000°T O0O°T 000 000° g9s* 99S° Lol*-—s LoL’- g 
lol: Lol*’- Lb Zy 000°T O00°T 000° 000° Tay 99S° 996° lol’- Lol’- LF 
995° 996" 9 000° 000" Ong" On" <Sy°  €Gn° gS" 996° 9 . 
99S" 995° 4 000° = 000", O9* =O €Sh° Gh 99K" 99S" S 
00g" 000° * 995° 99S" €Sn° Gy" On9* On9° 000° 000° " p 
00g* 000° € ay 995° 996° CGh*  €Gy* TL; On9* Ong" 000° 000° € $ 
000° 0C0CO'T ss 2 Lol*- Lol*- 996° 996° 000" 000° OOO°T OCOCO'T 2d 
000° 00O'T_—siéiT Lol*- Lol’- 995° 996° ooo’. 000° coo’r cout ft ™ 
a Vv 8 4 9 ¢ 4 € z T 
2 Aieq30a T Ax0430a 
"1 u 








eTdurexg 10Z9Bq-OAL, SNOTZTZOTA B OZ (°a) XFIZ@ JOZOBL Teuo#0q740 pue (HY) suoTZETer1I001930T 


T FIavi 


PSYCHOMETRIKA 


Also shown in Table 2 are the two factor matrices A, and A, that are 
obtained, column by column, from W, , W, , and y' by means of 


(4) An = Wri 
and 
(5) Ajo si’ W yov} 


({1], eq. 23.1 and 23.2). A, and A, satisfy (3) except for rounding error. 
The first column of A, and of A, must contain imaginary factor loadings 
in order that (3) may hold; this is the reason why y, had to be taken as the 
negative square root of 7; . 

The emergence of imaginary factor loadings in an application of the 
inter-battery method is startling enough, but let us go further and determine 
how well R,, and R,, are fitted by A,A{ and A.A}, respectively, in an example 
known to have a perfectly fitting and real two-column factorization. The 
matrices A,A/ and A,A}; are shown across the top in Table 3, and beneath 
them are given the two residual tables R,, — A,A{ and R,. — A,Aj . Here 
are found some negative communalities implied by A, and A, , and some 
residuals exceeding unity, while other residuals are alarmingly large. 

Inspection of certain formal properties of the factor matrices A, and A, 
is necessary to achieve an understanding of what has gone wrong in the 
foregoing example. Since W, and W, are matrices of characteristic vectors 
(of H, and H, , respectively), they have the property that 


(6) Wiw, =I 

and 

(7) WW, =I 

(cf. [2], p. 501). Then, by virtue of equations (4) and (5), A, and A, will 
always have the property that 

(8) AVA, = WW = ¥ 

and 

(9) AA, = 7 WiW.y' = v. 


This property is unique to principal components factor matrices (cf. [2], 
p. 501). Thus A, and A, are both principal components factor matrices for 
two different vector configurations whose characteristic roots are identical. 
Furthermore, since by (3) the scalar product matrix R,, is reproduced by 
the simple product of the two matrices of rectangular Cartesian coordinates 
A, and A} , the orthogonal reference frames implied by A, and A, must be 
identically located in the very same space. Thus it becomes clear that the 
inter-battery method treats R,, as defining a space in which the two vector 





W. A. GIBSON 23 


configurations have the same principal components reference frame and 
the same sums of squared projections thereon. That space, however, may 
be complex or otherwise different from the factor space of overlapping vector 
configurations. The residual matrices in Table 3 are thus interpretable as 
showing, for the present example, the difference between the within-battery 
scalar products of these two different spaces. Alternatively, Tucker’s way of 
presenting the inter-battery method, in which A, and A, are treated as 
orthogonal factor matrices for the overlapping factor space (ef. [1], eq. 2 
and 3), assumes implicitly that, within that factor space, the two sets of 
principal axes and the associated characteristic roots coincide. If these 
assumptions are not met, then A, and A, , while having the formal properties 
expressed in (8) and (9), may exhibit such properties as the non-Gramian 
and poor reproduction of R,, and R,. that have been seen in our example. 
Inspection of Figure 1 indicates that the two principal components reference 
frames are differently located in the factor space common to the two vector 
configurations. 

It is interesting to consider why the illustrative example used by Tucker 
({1], p. 114) did not exhibit difficulties of the kind encountered here. Gross 
inspection of Tucker’s example suggests that the makeup of the two batteries 
may have been such that the two sets of principal axes were not too dis- 
similarly located within the factor space implied by 2,2 , and that the corre- 
sponding characteristic roots were not too different. Tucker’s example 
certainly did not show the non-Gramian properties encountered here. His 
example might have departed enough from the inherent assumptions discussed 
here to give even poorer reproduction of R,, and of R,. than even the presence 
of factors specific to either battery would suggest, or than might be expected 
from the fact that R,, and R,. were not used in the solution while R,, was 
subjected to least squares fitting. 

In the abstract of his paper ((1], p. 111) Tucker indicates that the two 
batteries need not be made up of parallel tests. While this is true, the con- 
siderations discussed here suggest that, for safe application of the inter- 
battery method as presented by Tucker, there would have to exist a parallelism 
of the sub-batteries of tests representing each simple structure factor, in 
the sense that their numbers or their sums of squared loadings on the factor 
could not be too different for any factor. Otherwise the two sets of principal 
axes would be quite differently located. In two dimensions the proper selection 
of the two batteries might not be too difficult to achieve; with three or more 
factors, the additional degrees of freedom afforded the two reference frames 
could easily and frequently lead to a non-Gramian and poorly fitting (of R,, 
and R,.) solution. These considerations cast grave doubt upon the general 
applicability of the inter-battery method in its original form. 

A prepublication reviewer has pointed out that, although Tucker’s 
A, and A, may not themselves constitute proper orthogonal factor matrices, 











24 PSYCHOMETRIKA 


each is separated therefrom only by a linear transformation, one of these 
transformations being the transposed inverse of the other. In the particular 
ease of the present example, he has provided a pair of such transformations, 
both containing some imaginary elements. When respectively applied as 
post-multipliers to the A, and A, of Table 2, these yield the upper and lower 
parts, respectively, of the F, of Table 1. 

More generally, the reviewer has suggested that an appropriate pair of 
orthogonal factor matrices, F, and F, , might be obtainable from A, and A, 
by means of the equations, 


(10) F, = A,T 
and 
(11) F, = A,T™')’, 


where 7 is a linear transformation to be solved for by some means. That 
F, and F, reproduce R,, can be seen from the insertion of the product 
TT~* = I into the right side of equation (3), i.e., 

(12) Ry. = A\TT™" Az = (AiT)(AAT")’)’ = FF? . 


The suggested method of solution for 7 is to have it be such that, by virtue 
of (10) and (11), 


(13) F,F{ = A,TT’Aj = Ri 
and 
14) FF, = A,(T™)'T" Az = ATT’) Az & Rao , 


from which least squares expressions for 7'T’ and (7’'T’)~’ could be obtained 
by the usual techniques. If both of these expressions were to be used to 
obtain the desired transformations, they would somehow need adjusting so 
that the one would be the inverse of the other. Initially they would not be 
compatible in this sense because of having been derived from different 
empirical data. The expression for 77” could be arbitrarily factored to 
obtain a 7’, and then (7'~’)’ could be found by forming the product 


(15) (TT’)°T = (TTT = (T’)" = (T"')’. 


Since (13) and (14) contain R,, and R,, , their use in obtaining the 
transformations needed to form an F, and an F, would inject these two 
matrices, including their diagonals, into inter-battery factor extraction for 
the first time. In this context it seems clear that the appropriate diagonals 
for R,, and R,, would be the communalities. Thus, contrary to several 
statements by Tucker with reference to his method in its original form ((1], 
abstract on p. 111, p. 113, and p. 135), communality estimation would enter 
into inter-battery factor extraction as modified by the reviewer's suggestion. 





W. A. GIBSON 25 


This might eliminate much of the theoretical advantage of inter-battery 
factor extraction. On the other hand, it might turn out that errors in com- 
munality estimation using this approach would be less crucial than with 
the more conventional factoring methods, and that this would justify using 
the somewhat less direct procedures involved in inter-battery factor 
extraction. 

Another possible difficulty with this approach is that when factors 
different from those required to account for R,. are needed for R,, or Re» 
or both, the least squares expressions for 77” and (7'T’)~* would be striving 
to fit R,, and R.,. rather than to provide the transformations most appropriate 
to the factors involved in R,, . Just how this occurrence might manifest 
itself is not easy to see, for certainly R,, is as well reproduced with one 7’ as 
another, and the least squares properties of the solution might be powerful 
enough to mask the fact that additional factors are needed for R,, or Ra, or 
both. Conceivably imaginary loadings or communalities exceeding unity could 
occur, though perhaps only suspiciously small or large communalities would 
result. Some amounts of care in forming the two batteries might need to be 
exercised in order to avoid such distortions. 

The problems in the use of R,, and R2». to obtain 7 and (77')’ might 
be partially avoided by using only R,, , or only R,2 , and by judicious choice 
of which one to use. The price paid would be, of course, a disparity in good- 
ness of fit between the two sets of within-battery correlations. It might be 
that T and (T~’)’ could be obtained by procedures, yet to be derived, that 
make no use of #,, and R., . For example, reasonable upper and lower limits 
on the communalities derived from R,, alone, coupled with the requirement 
that all loadings be real, might impose powerful enough constraints on 7' 
and (7~*)’ that some kind of “centering” within the range of permissible 
solutions would be satisfactory. In any event, the idea of transforming, by 
whatever method, Tucker’s A, and A, into an acceptable F, and F, might 
be the means by which inter-battery factor extraction, as a way of solving 
the communality problem, could achieve general applicability. 


REFERENCES 


{1]) Tucker, L. R. An inter-battery method of factor analysis. Psychometrika, 1958, 23, 
111-136. 
[2] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 


Manuscript received 3/13/59 
Revised manuscript received 7/25/59 








PSYCHOMETRIKA—VOL, 25, No. 1 
MARCH, 1960 


MULTIDIMENSIONAL UNFOLDING: DETERMINING THE 
DIMENSIONALITY OF RANKED PREFERENCE DATA 


JosepH F. BENNETT* 


LINCOLN LABORATORIES, MASSACHUSETTS INSTITUTE 
OF TECHNOLOGY 


AND 
WiuuraM L. Hays 


UNIVERSITY OF MICHIGAN 


A model is proposed which treats rankings given by a group of goa ” 
representing regions in an isotonic space of dimensionality r. Three possible 
criteria for estimating lower beeked dimensionality are discussed: mutual 
boundary, cardinality, and the occurrence of transposition groups. Problems 
associated with each criterion are mentioned 


The task of a psychological scaling technique is to search for some 
form of lawfulness, i.e., redundancy, in experimentally collected data. This 
redundancy, when it is present, permits a description of the items (also 
perhaps of the subjects) which is simpler than an exhaustive account of 
the response of every subject to every item, and yet tells the experimenter 
everything he wants to know about his data. If the scaling technique is the 
Guttman scale [5], for example, the experimenter hopes to find that the score 
of one of his subjects will tell him not simply how many items the subject 
passed but which items he passed, within some reasonable margin for error. 
Sometimes this scaling process is an end in itself; at other times the items or 
the people are scaled in order to “calibrate” them for application in some 
other context. In either case, the scaling technique chosen for a particular 
application must be appropriate to the task given the subjects, i.e., whether 
they were asked to agree with the items, pass them, rank them, or compare 
them in pairs, etc. 

This paper is concerned with a scaling technique designed for the analysis 
of ranked preference data. The subject is asked to rank a group of items, 
for example, the names of hobbies. The model states that each hobby can 
be characterized by its position on each of several underlying attributes 
(e.g., scientific-artistic, solitary-gregarious, skilled-unskilled, ete.). The 
model states further that every subject can be characterized by his own 
maximum preferences on each of these attributes, and that he will rank 


*Deceased. 
27 





28 PSYCHOMETRIKA 


the hobbies according to their increasing distances from the ideal hobby 
defined by his own maximum preference on each attribute; e.g., the 
scientific, gregarious, skilled subject will probably give model-airplane 
racing a high rank, photography a middling rank, and finger painting a low 
rank. 

The critical problem in constructing such a model is the selection of 
a weighting scheme by which the subject is hypothetically supposed to 
combine the attributes in judging the distance of a given hobby from his 
own ideal hobby. The scheme proposed here is the simplest available: let 
the attributes be the axes of a multidimensional space, and interpret 
“distance”’ literally as the distance from the point representing the subject’s 
ideal hobby, located by its projection on the axes, to another point represent- 
ing one of the hobbies listed in the questionnaire. 

Before elaborating these points, some questions which this introduction 
is likely to raise will be considered. First, is this model intended for appli- 
cation to the sort of preference data ordinarily analyzed by the method of 
rank-order [7]. No, it is not; there the task is to uncover an underlying order 
of popularity of the stimuli. The analysis presupposes that all subjects agree 
on this order and that all would give it, were-it not for random errors. In 
other words, only the stimuli are being scaled, the people are not. The present 
model is usable only where subjects differ fundamentally in their preferences 
for the items, though they all view these items substantively the same within 
some common system of attributes. To choose another illustration, this 
model would be appropriate to the analysis of voters’ preferences for political 
candidates if there were reason to suppose that each voter could be character- 
ized by his position on, let us say, the conservatism-liberalism and 
isolationism-internationalism continua, and that he valued the available 
candidates in proportion as they approximated his own position on these 
continua. Within the “stimulus space,” the voters might agree perfectly 
as to how the candiates differ from each other: it is required that the voters 
themselves differ only in their preferences among the candidates. In this 
system, as in Guttman scaling, the subjects as well as the items may be 
scaled. 

Second, is it really necessary that these continua be agreed upon by all 
the subjects, that is, that everyone have the same conception of the liberality 
of Senator Jones or the gregariousness of photography? Yes, fundamentally 
it is, with some qualification since this method is nonmetric. It will deal 
throughout only with rankings, and will never attempt to determine numerical 
values for the distances between points; only relative distances in the form 
of rank orders will be considered. Within the relative freedom of a nonmetric 
model there is allowance for some minor differences between subjects about 
the structure of the space. But since the point of the method is to discover 
this common underlying structure on which the model supposes the subject’s 





JOSEPH F, BENNETT AND WILLIAM L. HAYS 29 


responses to be based, if the structure is not there it will not be found and 
the method will fail. This is true of every other similar system, from Guttman 
scaling to factor analysis. Of course when one speaks of agreement between 
subjects about the structure of the space one does not mean explicit verbal 
agreement, any more than a subject taking an 7Q test would be expected 
to be aware of the factorial compositions of the items. As in factor analysis, 
it is necessary to assume that the factorial composition of an item is the 
same for all subjects. 


Origins in the Unfolding Method 


It will be evident to the reader acquainted with scaling literature that 
the model proposed above is simply a multidimensional generalization of 
Coombs’ method of unfolding [2, 3]. Coombs supposes that both subjects 
and items can be represented by points on a line segment (the single attribute 
under investigation). Each subject ranks all of the items in order of their 
increasing distances from his own position. Since he is concerned only with 
ordering the magnitudes of these distances and does not care about their 
directions, the subject might be said to pick up the continuum at his own 
position as one might pick up a piece of string, letting the ends swing together 
and fuse. The analytic task is the unfolding of these rankings (whence the 
name), in order to recover the original ordering of subjects and items on 
the continuum. The method used to accomplish this task, described in detail 
in the papers cited, requires no definition of ‘‘zero,” “addition,” or the ‘unit 
interval,” and in fact employs no properties of the real line except the ordering 
of its points and the comparability of (some) intervals in magnitude. Internal 
checks provide that even these assumptions shall justify themselves in 
practice; that is, the unfolding method is a scaling criterion as well as a 
scaling method, and any given set of rankings may or may not unfold. 

It is apparent that the only change introduced by the present authors 
is the replacement of Coombs’ line segment by a space of dimension r, which 
may equal one and may be greater than one. The following section will 
explain a number of ways of determining r from the data. It will be assumed 
that distances in this r-dimensional space can be compared in magnitude, 
so that it is meaningful to say “‘it is farther from A to B than from C to D,” 
where A, B, C, and D are points in the space. 


Some Definitions 


Let the following definitions refer to the familiar Euclidean space, 
although (as the reader will see) all the Euclidean properties will not be 
needed. 

The set C of subjects c, , --+ , ¢ , *** , ¢, and the set Q of items or 
objects q@: , *** , Qi» °** » Yq, are regarded as sets of points in a space of dimen- 
sion r. When a system of reference axes is inserted in the space, each subject 





30 PSYCHOMETRIKA 


or object point can be characterized by an ordered r-tuple of real numbers, 
€.g., 9; = (1; » Gi » *** » Gri), Which are its projections on the axes. The 
task of each subject is to rank the whole set of objects according to their 
increasing distances from his own position. 

Consider what position in space a subject’s ideal must occupy in order 
to present a particular ranking of the objects. It is clear that relative to any 
two objects, A and B, a subject may make one of three reports: A is pre- 
ferred to B, symbolized A- > B (which means that A is closer to the subject 
than is B); the converse, B -> A; and third, that he is undecided between 
them, symbolized A = B, which means that the subject’s ideal is equidistant 
from A and B. In one-space (that is, an infinite straight line), there is only 
one point exactly equidistant from two distinct points, which is called their 
midpoint. All the points on the line to one side of that midpoint will be 
closer to one of the objects, and all on the opposite side will be closer to the 
other object. Only a subject with an ideal lying exactly on the midpoint 
would report A = B, that is, indecision between the two. (See Figure 1.) In the 
unidimensional case explored by Coombs, the whole one-space is segmented 
by these midpoints into regions, within each of which every subject will 
report the same ranking of the objects, although within any one region 
different subjects’ ideals may lie at different absolute distances from the 
objects. 


MIDPOINT AB 
A 


L>. <<< 





_.* 
<<< O— 


—REGION IN WHICH A>B-P}<- REGION IN WHICH B>A— 





Figure 1 
Midpoints and Regions Generated by Two Stimulus Points 


In two-space (a plane), the locus of equidistance from two objects will 
be a line, the perpendicular bisector of the line segment connecting those 
objects. All subjects with ideals in the plane on one side of this line, the side 
containing the object A, will report A -> B; all those on the opposite side 
will report B -> A; only those exactly on the line will report A = B. In 
general the locus of equidistance from any two distinct objects will be a surface 
of dimension one less than the dimension of the total space in which the 
objects occur, i.e., a hyperplane. Hence the locus of equidistance from two 
distinct objects A and B is the boundary hyperplane H(A, B). Since it will 
also be convenient to speak of loci of equidistance from sets of more than two 
objects, this notation will be generalized: H(A, B, --- , N) will symbolize 
the locus of equidistance from a set of n objects (A, B, C, --- , N) inr dimen- 
sions, which will be a subspace (hyperplane) of some dimensionality less than r. 

Between three objects (A, B, C) in two-space not all on the same line, 





JOSEPH F. BENNETT AND WILLIAM L. HAYS 


08 SJNVIdY3dAH AYVGNNOS 








Boundary Hyperplanes Generated by Three Points in Two Dimensions 


there will be three such boundary hyperplanes, H(A, B), H(A, C), H(B, C), 
which will meet in a point equidistant from all three objects, H(A, B, C) 
(see Figure 2). In the same fashion, any four non-coplanar objects in three- 
space (A, B, C, D) determine a point equidistant from all of them, H(A, B, 
C, D) through which all six of their boundary hyperplanes of the form H(A, B) 
must pass. In a space of r dimensions, there will always be one and only one 
point equidistant from r + 1 objects which are scattered—that is, which 
do not all lie on a subspace of the r-space. 

These observations imply a general conclusion: in a space of r dimensions, 
if a locus of equidistance exists for some set of n points in general position 
(i.e., well scattered throughout the space), the locus will be a subspace of 
dimensionality r — n + 1. While this is actually a rather trivial result, it 
may be informative to sketch out a proof. It will be necessary to draw on 
a well-known feature of the geometry of higher spaces. The intersection of 
two r-dimensional subspaces S and 7’, neither of which is a subspace of the 
other, is a subspace of both S and 7 having r — 1 dimensions. The general 
outline for a proof using a form of induction argument on n, the number of 
points, would go as follows. 





32 PSYCHOMETRIKA 


For a set of two points, A and B, embedded in r dimensions, consider 
a line L joining A and B. Now let there be some set of r orthogonal reference 
axes (X, , X., --- , X,) such that X, is collinear with L. Let the origin be at 
the midpoint between A and B on L. The subspace everywhere orthogonal 
to X, must be a hyperplane of dimensionality r — 1. Now consider any 
point p which lies in this r — 1 hyperplane, such that p projects onto the 
origin on X, . It follows that the squared distance from p to A equals that 
from p to B as projected on X, . Furthermore, A and B each project onto the 
origin on the r — 1 hyperplane, so that the squared distances between p and A 
and p and B are equal as projected on the hyperplane. Thus, any point p in 
the hyperplane lies, by definition, in the locus of equidistance H(A, B). 
Furthermore, only points in the hyperplane of r — 1 dimensions fit the 
definition of H(A, B); otherwise the point p could not project equally distant 
from A and B on axis X, . Since the origin and particular set of reference 
axes utilized are completely irrelevant for distance among points in a 
Euclidean space, the locus H(A, B) is a hyperplane of dimensionality r — 1; 
that is, r — + 1 dimensions, n = 2. 

Now assume the proposition true for n — 1 stimulus points in general 
position in r dimensions. A set of n stimulus points (A, B, --- , Q, R) in 
general position in r dimensions may be divided into two overlapping sets 
of n — 1 stimulus points, (A, B, --- , Q) and (B,C, --- , Q, R). By assump- 
tion, the locus H(A, B, --- , Q) must be a space of dimensionality 
r—(n—1)+ 1,0orr —n-+ 2, and A(B,C, --- , R) a space of the same 
dimensionality. Suppose that H(A, B, --- , Q) and H(B, C, --- , R) inter- 
sect. Since each of these loci is a subspace of dimensionality r — n + 2, their 
intersection must be a subspace of r — n + 1 dimensions. By the tran- 
sitivity of the relation of equality, this subspace must also be the inter- 
section of all the remaining loci of equidistance of n — 1 points drawn from 
the original set of n points, so that the r — n + 1 space is by definition the 
locus H(A, B, --- , Q, R). If the loci H(A, B, --- , Q) and H(B, C, --- , R) 
do not intersect, then by the same transitive property of equality, the locus 
of equidistance from the set of r points may not exist. This proves the propo- 
sition. 


Regions 


Any one boundary hyperplane H(A, B) divides the whole space into 
two half-spaces, within each of which all points satisfy the same distance 
relation relative to the two objects A and B generating the hyperplane. 
Call one of these zones, that including the object A, the isotonic (same- 
ordered) region AB, (meaning A -> B) and the other, the isotonic region 
BA. If the boundary H(B, C) passes through the region AB, every point 
within the region AB which is also on the B side of the hyperplane H(B, C) 
will lie in the order A -> B ->- C from the three objects, so that this isotonic 





JOSEPH F. BENNETT AND WILLIAM L. HAYS 33 


region may be designated ABC. Indeed, any isotonic region is the set of all 
points on the indicated side of each of several boundary hyperplanes, or 
(equivalently) the intersection of a particular set of larger isotonic regions, 
or (equivalently) the set of all points satisfying a given set of distance re- 
lations relative to the objects. For this last purpose, a convenient notation 
is that of partially ordered sets or posets, in which the relation A -> B is 
indicated by writing AB. The division of a two-space by three objects is 
illustrated in Figure 3. It is evident that every isotonic region, being bounded 
by hyperplanes, is everywhere convex, that is, any two points within the 
region can be connected by a straight line which does not pass outside the 
region. 


Regions Generated by Three Points in Two Dimensions 


The minimum cells of the space, the regions out of which all other 
regions are constructed, will be those specified by a complete or simple 
ordering of all the objects, of the sort that the subjects in the space will 
make. Call these elemental regions. Thus region A(BC) in Figure 3 is made 
up of the union of two elemental regions, ABC and ACB. The notation 
A(BC) will be adopted here to symbolize the union of the elemental regions 
ABC and ACB—similarly the notation A(BC)D will mean the union of the 
regions ABCD and ACBD, (ABC)D will indicate the union of the regions 
ABCD, BACD, BCAD, CBAD, CABD, ACBD, and so on. 

Any region will wholly contain any other region whose defining poset 
wholly satisfies the poset of the first, with the addition of some further re- 
finement; thus the region AB must contain the region A(BC) if the latter 
exists. On the other hand, two regions cannot intersect, that is, have some 





34 PSYCHOMETRIKA 


subregion in common, if their posets contain both a relation and its comple- 
ment. Thus the region A(BC) cannot intersect the region BA because no 
point in the space could obey simultaneously the contradictory orders of 
distance AB and BA, that is, lie simultaneously on both sides of the hyper- 
plane AB. 

When k + 2 objects (or more) occur in a k-space, then certain of the 
(k + 2)! possible elemental regions will disappear. For example, in Figure 4, 
representing three objects in one-space, there is no region ACB and no CAB, 
i.e., the region (AC)B present in Figure 3 has disappeared. It is not an accident 
that those two particular elemental regions are missing. Their absence was 
dictated by the dimension of the space and the configuration of the objects 
in a way which the following sections will define. This dependence will become 
the analytic tool of the method, making it possible to reconstruct dimension 
and configuration from experimentally obtained rankings. 


Determining Dimension 


The first problem to be considered is that of finding r, the dimension of 
the space, given the rankings produced by the subjects. It is not claimed 
that the three methods suggested here exhaust all the possibilities of di- 
mensional analysis, or even that they are best; they simply recount as much 
as the authors know about the problem at the time of writing. 


MIDPOINT AC 
MIDPOINT “" P i. de 38 BC 
Qe 


FicureE 4 
Rank Orders Associated with Regions Generated by Three Stimulus Points in One Dimension 





Dimension by Mutual Boundary 

In the light of the definition of an isotonic region as the set of all points 
satisfying a given set of distance-relations relative to the objects, the origin 
of the term boundary hyperplane is clear. Every isotonic region is bounded 
by segments of boundary hyperplanes, that is, by (r — 1)-dimensional cells. 
In Figure 4, for example, the region BAC is bounded by the hyperplanes 
H(A, B) and H(A, C). In Figure 3, region A(BC) is bounded by segments of 
the hyperplanes H(A, B) and H(A, C). Note that it is not bounded by the 
hyperplane H(B, C). This is because the poset defining the region A(BC) 
specifies no distance-relation between B and C, so that a moving point 
inside A(BC) might reverse its distance-relations with B and C—that is, 
it might pass through the boundary hyperplane H(B, C)—without leaving 





JOSEPH F. BENNETT AND WILLIAM L. HAYS 35 


the region A(BC). Of necessity, every hyperplane bounding a given region 
is represented by a pair of objects which are adjacent in the defining poset 
of that region. (Two elements of a poset will be said to be adjacent if one 
precedes the other in rank and if there is no other element in the poset 
immediately succeeding the first and preceding the second.) However, the 
converse that every hyperplane defined by a pair of objects adjacent in 
the poset actually bounds the region is not always true. For example, in 
Figure 4, region ABC is bounded only by the hyperplane H(A, B). In all 
such cases, any remaining hyperplane defined by an adjacent pair, such 
as BC in Figure 4, constitutes the boundary of a larger region containing 
that region. 

Two isotonic regions bound if they do not intersect, that is, there is no 
point common to both of them, and the same r — 1 cell is part of the set of 
r — 1 cells bounding each. Thus in Figure 4 region ABC bounds the three 
regions BAC, B(AC), and BA. Since all isotonic regions are everywhere 
convex, two isotonic regions can bound only on a single hyperplane or on 
entirely coincident hyperplanes. Hence two isotonic regions must bound, 
granted they both exist, if the defining poset of one can be transformed into 
the poset of the other by the reversal of a single adjacent pair of objects; 
this adjacent pair represents the hyperplane which separates them. The 
possibility of coincident hyperplanes does not permit concluding that isotonic 
regions bound only if such a single-pair transposition of their defining posets 
is possible. If hyperplane coincidence were to occur, passing from one region 
into the other would mean simultaneous passage through two (or several) 
hyperplanes, and consequently the simultaneous reversal of one (or several) 
adjacent pairs. However, the existence of coincident hyperplanes could be 


Figure 5 
Regions Generated by a Configuration of Four Points in Two Dimensions 





36 PSYCHOMETRIKA 


readily detected in the total set of rank orders returned by all the judges 
by a complete mutual implication of the defining posets of their isotonic 
regions. 

The value of the concept of boundary lies in its potential application 
as an initial dimensional criterion to indicate whether the space may be 
of dimension one, two, or higher. In a space of dimension one, it is not possible 
for more than two regions to bound one another, and no more than four 
isotonic regions may all bound one another in two-space. (Of course this 
famous “four-color” conjecture has not actually been proved [4], but it may 
probably be used without any great anxiety of its imminent disproof.) The 
two-space limit is illustrated in Figure 5, where four regions all bound one 
another along cells of hyperplanes. The reader may convince himself that 
it is impossible to add a fifth region which will bound all of the existing four. 

A hypothetical example may help to clarify the criterion of mutual 
boundary. Imagine that a group of judges, in evaluating four objects, return 
the following rank orders. 


ABCD ACBD ADBC BCAD CDAB 
ABDC ACDB ADCB CBAD CDBA 
BACD CABD DABC CBDA DCAB 

CADB DACB DCBA 


The presence of ABCD and BACD in these data is evidence, for example, 
that A(BCD) bounds B(ACD). Note that in four cases, sets of three regions 
all mutually bound. These are C(DA)B, D(CA)B, A(DC)B; C(AB)D, 
C(AD)B, C(BD)A; A(CD)B, A(BD)C, A(BC)D; A(BC)D, B(AC)D, C(AB)D. 
This excludes at once the possibility that the solution is unidimensional. 
On the other hand, a two-dimensional solution is not excluded, since there 
is no instance where five or more regions mutually bound. In several instances 
sets of four regions do mutually bound—for example, (AB)(CD), (CB)(AD), 
DB, and (CA)BD all bound one another. 

It has not so far proved possible to extend the mutual boundary approach 
into spaces of dimension higher than two. Fortunately the existence of more 
powerful methods renders this unnecessary. The method may continue to 
serve as a quick check of the possibility of a solution in one or two dimensions, 
and as a first outline of the configuration of objects in the space. 


Dimension by Cardinality 

The boundary method just outlined operates by the determination of 
a lower bound to the dimensionality of the space; that is, by concluding that 
a given set of rankings must have been given by judges in a space of dimension 
“at least k, and perhaps higher, but not less than k.”’ In the same fashion, 
the following method will serve to impose a lower bound on the possible 
dimension of the space by a comparison of the total number of different 





JOSEPH F. BENNETT AND WILLIAM L. HAYS 37 


rankings returned by the judges, with C(n, k), the maximum possible number 
of elemental isotonic regions that can be generated by n objects in k dimen- 
sions. Thus Coombs has shown that four objects can generate no more than 
seven rank orders in one-space; if eight or more distinct rankings were ex- 
perimentally obtained, they could not all be accommodated in a uni- 
dimensional solution. 

In one-space, there will be one more elemental region than boundary 


hyperplanes (i.e., midpoints). Since there will be (7) boundary hyperplanes 


for n objects, the maximum cardinality for n objects in one-space is 


(1) C(n, 1) = (") +i: 


Special circumstances, such as the coincidence of boundary hyperplanes, 
can serve only to lower the number of regions in the space. Consequently, in 
determining C'(n, k), assume as before that the objects are scattered so that 
no such special configuration can occur. 

In determining C(n, k) for spaces of dimension k — 1, Professor K. E. 
Leisenring of the University of Michigan (personal communication) has 
pointed out that when the objects are scattered, every boundary hyperplane 
will be subdivided into isotonic regions by boundary hyperplanes in the 
- same fashion as a complete space of one fewer dimension and containing 
one fewer object. Leisenring’s observation may be employed to determine 
C(n, k) by considering that the addition of a new object will add as many 
new boundary hyperplanes as there were objects already in the space—one 
between each of the previous objects and the new member. Each of these 
new hyperplanes will, in turn, be responsible for creating as many new regions 
as it intersects, since each of the intersected regions will be cut into two. 
But these regions may be readily counted by the number of (k — 1) cells in 
each new hyperplane, which, by Leisenring’s observation, will be equal 
to C(n — 1, k — 1). Consequently, 


(2) Cn, = Cn — 1,6) + (n — Cn — 1, k — 1). 


No satisfactory nonrecursive expression for (2) has been found. How- 
ever, Dr. R. M. Thrall in a personal communication has pointed out the 
identity of the values obtained from it with sums of absolute values of 
Stirling numbers of the first kind [6], for which also no general expression 
exists. The relation may be written 


(3) Cn, k)= do | Sri, 
where S™ is a Stirling number. Some values of C(n, k) are given in Table 1. 
The reader will note that when k = n — 1, C(n, k) = n!. That is, a space of 





38 PSYCHOMETRIKA 


dimension n — 1 can always account for all possible orderings of n objects, 
just as in components analysis a factor space of n dimensions can always 
account for all the variance in n tests. Of course such a solution is trivial; 
a solution is not counted successful unless it accomplishes some economy of 


TABLE 1 


Maximum Number C(n,d) of Rank-Orders 
Generated by n Objects in d Dimensions 








d 
3 





1 
2 

6 
24 
96 
326 
932 
103 
103 
10¢ 
104 
104 


104 


3.140 
5.728 
1.010 


” 





description by accommodating all or nearly all the data in comparatively 
few dimensions. 

There is an obvious similarity between the cardinality criterion just 
discussed and the dimensionality criterion proposed by Bennett [1] for the 
problem of analyzing rank-orders of subjects given by tests, within the general 
model of factor analysis. Bennett’s model is restricted exclusively to the 





JOSEPH F. BENNETT AND WILLIAM L. HAYS 39 


case in which the rank orders given by tests are monotone, as seems proper 
in the factor analysis context. This restricts the possible isotonic regions 
strictly to open regions, by the condition that for two subjects A and B 
(not stimulus objects as here) H(A, B) shall pass through the origin of the 
space. 

On the other hand, there is no such monotonicity assumption made in 
the model of preferential choice used here, as, unlike test performance, 
individual preference may reasonably be thought of as non-monotone. 
Thus, the possibility of both open and closed regions is allowed. In conse- 
quence, other things being equal, the maximum cardinality numbers given 
in Table 1 are always greater than or equal to those given in Bennett ({1], 
p. 388). However, Table 1 from [1] could be used if there were some way of 
knowing from the data that only open regions were represented, as his table 
does give the maximum number of such open regions which may occur in 
this model. Since each mirror-image region in a space must have a mirror 
image, another region which has the exact reverse of its rank order, and 
since only open regions show such mirror images, one might supplement the 
cardinality criterion in the following way. Count the number of mirror- 
image pairs of rankings existing in the data, and compare them with one- 
half the number of regions listed in Bennett’s table for the appropriate n 
and a given r. If this number is exceeded, then the dimensionality is greater 
than r. 

In the hypothetical example given previously, eighteen rankings were 
returned by the subjects. Reference to Table 1 shows this to be just the 
maximum allowable number for four objects in two dimensions; hence there 
is no reason as yet to believe that the dimension of the space is higher than 
two. 

The method of cardinality has the peculiar feature that it does not 
consider what rankings were returned. This gives it a great advantage in 
simplicity and ease of application, but of course it also makes it a very 
insensitive test, likely to give an optimistically low estimate of dimension 
when applied to a chaotic set of rankings. Furthermore, Table 1 indicates 
that in successively higher dimensions the number of different possible 
rankings goes up very rapidly with the number of items, and in most practical 
instances is likely to exceed the whole size of a sample of judges which an 
experimenter might use. This means that the method of cardinality can be 
applied directly (in the form of a direct comparison of the number of distinct 
rankings obtained with the entries in Table 1) only to data (such as repeated 
psychological judgments) in which the number of experimentally independent 
rankings is large, preferably much exceeding the factorial of the number of 
objects ranked. In most other circumstances, one cannot expect to have 
enough experimentally independent rankings to exhaust all the permissible 
rankings tabulated in the appropriate cell in Table 1. 





40 PSYCHOMETRIKA 


Dimensions by Groups 


A space of r dimensions contains every possible ranking of any r + 1 
objects, or expressed another way, contains the complete transposition 
group of simply ordered sets of those objects, provided only that the objects 
are scattered. This observation is independent of the presence in the space 
of objects other than the r + 1; that is, the regions in question may be either 
elemental regions, or the (r + 1)! larger regions determined by the simple 
orderings of the r + 1 objects. On the other hand, a space of dimension r 
cannot contain the complete transposition (permutation) group over r + 2 
objects or more. 

These ideas may be summarized in the following proposition. A set of 
r stimulus points (A, B, --- , Q, R) may be embedded in a space of no fewer 
than r — 1 dimensions if and only if a set of r! regions exists characterized 
by all r! permutations in order of the stimulus set. A proof of the necessary 
condition might be carried out by induction on r, the number of stimuli and 
r — 1, the number of dimensions. This is trivial for r — 1 = 1, since two points 
may always be put onto a line with center halfway between them, generating 
two regions showing the two permutations. 

Now assume the proposition to be true for any r — 1 stimuli in r — 2 
dimensions. Consider the set of r stimuli (A, B, --- , Q, R), for which, by 
hypothesis all r! permutations in order exist as regions. If this is true, then 
all (r — 1)! permutations in order of the set of r — 1 points (A, B, --- , Q) 
must also exist as regions, and this set of r — 1 points is embedded in no less 
than r — 2 dimensions. Now suppose that stimulus point FR is also embedded 
in the same subspace of r — 2 dimensions so that all r! permutations corre- 
spond to regions in r — 2 space. This would contradict the cardinality rule 
given above, which shows that the maximum number of regions for r stimuli 
in r — 2 dimensions is always less than r!. Hence, the dimensionality would be 
greater than r — 2, and the set of r points may all be embedded in a space 
of no fewer than r — 1 dimensions. 

For the sufficient condition, a proof may once again be outlined using 
an induction argument on r and r — 1. The case of r = 2 is trivial. Now 
assume the sufficient condition true for r — 1 points in r — 2 dimensions. 
By hypothesis, the set of r points (A, B, --- , Q, R) requires r — 1 dimensions. 
Thus, some subset of r — 1 points from this set, say (A, B, --+ , Q), requires 
r — 2 dimensions, so that by assumption all permutations in order for this 
set must exist as regions in the r — 2 space. The locus H(A, B, --- , Q) will 
exist as a point in the r — 2 space, by the rule of r — n + 1 for the dimension- 
ality of a locus of n points. By the same rule, when the set of r — 1 points 
is embedded in an r — 1 space, the locus of equidistance would be a line L 
bounding all (r — 1)! regions showing the permutations in order for the set. 

Since the addition of stimulus point R to the space requires, by hypo- 
thesis, dimensionality r — 1, so that R may not be embedded in the r — 2 





JOSEPH F. BENNETT AND WILLIAM L. HAYS 41 


space, the intersection of H(A, R), for example, and H(A, B, --- , Q) must 
exist—otherwise the required dimensionality would be r — 2. Furthermore, 
by the transitivity of the equality relation, this point of intersection must 
also be the point of intersection of H(A, B, --+ , Q) with all of the remaining 
loci H(B, R), H(C, R), --- , H(Q, R), so that by definition the intersection 
is the locus H(A, B, --- , Q, R). Now any region showing a permutation of 
(A, B, --+ , Q) such as ABC --- Q must be bounded by the line L. The 
intersection of L with H(A, R) creates two new regions such as RABC --- Q 
and ARBC --- Q, the intersection of H(BR) with L another two new regions 
such as ARBC --- Q, ABRC --- Q, and so on. Since the intersections of a 
basic region showing a particular permutation of A through Q with the r — 1 
hyperplanes of the form H(A, R) must generate r distinct new regions, each 
showing the same permutation of A through Q but a different position of R, 
and since there were originally n — 1 basic regions showing permutations 
of the set of r — 1 stimulus points, exactly r(r — 1)! or r! regions must be 
generated showing all permutations in order of (A, B, --- , Q, R). 

This fact can be made the final (and perhaps the most useful) criterion 

\ 


XN 
XN 








Figure 6 
Regions Generated by a Second Configuration of Four Points 





42 PSYCHOMETRIKA 


of dimension. One may be certain that the minimum dimension of the space 
in which a complete solution may be realized must be one less than the 
number of elements in the largest transposition group present in the ex- 
perimental data. 

Returning again to the hypothetical example, the complete group over 
all four objects is not present, since such a group would have 24 elements 
and only 18 are present. Searching for groups over subsets of three objects, 
one finds, for example, every permutation of A, B, and C: 


region ABC, represented by DABC, ADBC, ABDC, and ABCD; 
region BAC, represented by BACD alone; 
region BCA, represented by BCAD alone; 
region CBA, represented by DCBA, CDBA, CBDA, and CBAD; 
region CAB, represented by DCAB, CDAB, CADE, and CABD; 
region ACB, represented by DACB, ADCB, ACDB, and ACBD. 


The same is true of each of the other subsets of three objects, as the 
reader will discover on examination. Since there is considerable evidence 
that the solution is two-dimensional, it is possible to attempt a geometric 
reatization of the space. Such a construction is given in Figure 6. Using 
this construction one may reexamine the previous examples and relate them 
to the properties of the figure. 

Of these three criteria for dimensionality, the groups criterion just 
discussed is the most practical and, to large extent, the most sensitive for 
use with preference data. Obviously, with a large number of stimulus items, 
say 10, it is seldom possible to accumulate enough data to establish the 
lower limit of the dimensionality at nine since this would require some 10! 
distinct rankings. 

However, it is possible to make a good lower bound estimate from more 
limited data using a modification of this criterion. Given n objects (A, B, 
--+ ,Q, R), the required dimensionality is n — 1 if there exist two complete 
sets of (n — 1)! permutations in order for some subset of n — 1 objects, 
say (A, B, --- , Q), such that the remaining object R precedes all of the 
remaining objects in each permutation in the first set, and follows all of the 
remaining objects in each permutation of the second set. This is seen to be 
true if it is recalled that the locus of equidistance from r — 1 points in r — 2 
dimensions is a point, so that there may be only one set of “permuting”’ 
regions bounding such a point. If there are two distinct sets of this kind, 
then the locus in question must be at least a line, and the dimensionality 
must be at least r — 1. This modification requires only 2(r — 1)! rank orders 
to establish a lower bound dimensionality of r — 1, rather than the complete 
r!. 

Finally, a very stringent criterion is obtained by a combination of the 
principles utilized in the cardinality and the groups criteria. It will be recalled 





JOSEPH F. BENNETT AND WILLIAM L. HAYS 43 


that the locus of equidistance between two points in r dimensions is a space 
of dimensionality r — 1. Each and every pair of regions with rank orders 
differing only by a reversal in order of one pair of objects must be separated 
only by the locus. Furthermore, for n points in r dimensions, the number of 
such pairs differing only by a reversal in the same object pair (the same 
hyperplane bounds each pair) must be the cardinality of nm — 1 points in 
ry — 1 dimensions. Thus, if a set of such pairs is found, and the number of 
such pairs exceeds the maximum cardinality of n — 1 points in r — 1 dimen- 
sions, then the dimensionality must be at least r + 1. 

For instance, in Figure 6, there are exactly four pairs of regions differing 
only in the order of A and B: ABCD-BACD, CBAD-CABD, CDAB-CDBA, 
and DCBA-DCAB. Thus the cardinality of these pairs of regions agrees 
exactly with the cardinality for three objects in one dimension—there is 
no evidence from these orders that the dimensionality exceeds two. How- 
ever, had a fifth pair such as DBAC-DABC existed, then the dimensionality 
required would be three. 

This last criterion is, of course, the most sensitive of those discussed 
here, particularly with small numbers of stimulus objects. Actually, it may 
be too sensitive for use with fallible data such as would reasonably be obtained, 
since the estimated dimensionality depends somewhat more upon single 
occurrences among the rank orders than one would expect with the groups 
criterion alone. Thus, in practice, the application of the groups criterion is 
perhaps the method of choice for a “manageably low” dimensionality estimate. 

Incidentally, the method for describing configuration for such data is 
also based principally on the incidence of such permutation groups among 
the rank orders, so that once a preliminary estimate has been made for the 
required dimensionality, there are continual checks upon the estimate in 
the remainder of the procedure. 


REFERENCES 


{1] Bennett, J. F. Determination of the number of independent parameters of a score ma- 
trix from the examination of rank orders. Psychometrika, 1956, 21, 383-393. 

{2] Coombs, C. H. Psychological scaling without a unit of measurement. Psychol. Rev., 
1950, 57, 145-158. 

[3] Coombs, C. H. A theory of psychological scaling. Engineering Res. Inst. Bull. No. 34, 
Ann Arbor: Univ. Michigan Press, 1952. 

[4] Errara, A. Du coloriage des cartes et de quelques questions d’analysis situs. Paris: Gau- 
thier-Villars, 1921. 

(5] Guttman, L. A basis for scaling qualitative data. Amer. sociol. Rev., 1944, 9, 139-150. 

(6] Jordan, K. The calculus of finite differences. New York: Chelsea, 1947. 

(7] Thurstone, L. L. Rank order as a psychophysical method. J. erp. Psychol., 1931, 14, 
187-201. 


Manuscript received 1/3/59 


Revised manuscript received 8/30/59 











PSYCHOMETRIKA—VOL. 25, NO. 1 
MaARcH, 1960 


A CRITERION FOR SELECTING VARIABLES 
IN A REGRESSION ANALYSIS 


H. LiInHART 


SOUTH AFRICAN COUNCIL FOR SCIENTIFIC AND 
INDUSTRIAL RESEARCH, JOHANNESBURG 


Methods are given for deciding whether to use some or no predictor 
variables in a regression analysis. Previously obtained results on the more 
general problem, whether to use k or k — r predictor variables are reviewed 
with emphasis on applications. 


In [8] a criterion for selecting variables in a regression analysis was 
introduced. The aim of this paper is to extend these results and to discuss 
applications in more detail. The suggested criterion applies to multivariate 
regression analysis. It is assumed that the variable to be predicted, 2) , and 
the predictor variates 2, , %2, --+ , % have a k + 1 dimensional normal 
distribution. It is not suitable for the case where 2, , %2, °° , 2 are param- 
eters rather than random variables. 

Whether certain r out of the k available predictor variates should be 
included in the analysis is a question nearly always asked if a regression 
analysis is planned. The use of all k variates is generally believed to be inad- 
visable. The author’s earlier investigation [8] supported this point of view 
and gave a statistical method for the necessary decision. 

The basis of the criterion is a measure for the precision of prediction. 
If one wants to predict x by means of x, , %2 , --- , 2, one takes at first 
a “regression sample” of size n from the k + 1 dimensional distribution of 
Lo ,%,,°** , %. The regression sample is then used to estimate the regression 
coefficients. Afterwards a number of “predictor sets’”—samples from the k- 
variate distribution of x, , :-+ , 2—are sampled randomly and the corres- 
ponding values of x, are predicted. For a fixed regression sample and a fixed 
predictor set the length of a confidence interval for x» , /, say, may be calcu- 
lated. The expected value of /, (1), where the expectation must be taken over 
all possible regression samples of size n and over all possible predictor sets, 
may be taken as a measure for the precision of prediction within the given 
population if regression samples of size n are used. 

It can be shown that E(l) may be smaller if only certain k — r of the 
available k predictor variates are used, as compared with E(l) for all k pre- 
dictor variates. The precision of prediction, in the above mentioned sense, 
may therefore deteriorate if more variables are used. 


45 





PSYCHOMETRIKA 


To decide whether certain r variables should be used or not, it is sug- 
gested that the hypothesis H, : E(l) > E,,,(1) be tested and that the r varia- 
bles be used only if this hypothesis is rejected, that is if the alternative 
hypothesis H, : E(l) < E,,;(0) is statistically “proved.” (The index [r] always 
relates the symbol to which it is annexed to the case where r of the k varia- 
bles are omitted.) 

In [8] two cases were distinguished: in the first case one wants to predict 
Z, ; and in the second case one wants to predict E(2x,), the point on the re- 
gression hyperplane. The author is now of the opinion that in the second 
case the question whether some variable should be excluded from the analysis 
cannot occur in practice; it is therefore not considered in the present dis- 
cussion. 

The results obtained in [8] were 


25 in + yr(3) (24 


= A 
(n—k— yiar(2=4 ; 1) way 





(i) E() = 


where k > 1 is the number of predictor variables, n the size of the regression 
sample, A the determinant of the moment matrix in the population of zp , 
Z,,°** , %. Ao is the corresponding determinant in the population of 2, , 
Zo, ++ , % - t%2 is the 5 percent point (double tailed) of Student’s distri- 
bution with »v degrees of freedom. [For a summary of a proof of (1), see ap- 
pendix. | 

An unbiased estimate of (1) is given by 


(2) E()) _ [exnloo(1 3 R’)]} cten ’ 


—k-1 
a ae 
n+] 2 2 


, is (n—k=-1) 72 we 
3) gS Te ie ee r(" — r(" _ ‘) 
2 2 


where 








are constants which are given in Table 4. R is the multiple correlation co- 
efficient and 


n 
Noo = <P (Zo, — £,)*. 


v=l 


For the test of the hypothesis L(l) > E,,,(l), the statistic 


“= (4 = #ia)" 
1 + rial Bis 





H. LINHART 47 


may be used, which is distributed like a multiple correlation coefficient 
obtained from a sample of size N = n — k + r out of a population with r 
“independent” variables, having a multiple correlation 


2 Vian 2 1/2 
(5 y = (Eas) 
1 “at {r] 
P is the multiple correlation of x) on 2, , --- , %. 
The hypothesis H, : E(l) > E,,,(l) is equivalent to the hypothesis H, : 
U < U,, where the value of U, may be obtained using (1), 


1/2 
(6) U, = (1 re ttn) ; 
€kn 

As no tables of percentage points of the general distribution of the mul- 
tiple correlation coefficient are available, graphs of lower confidence limits 
for multiple P must be used for carrying out the test. (Such graphs are given 
for r = 1(2)7 in [5], and for r = 1(1)7 in [8].) 

The test is then as follows. If the lower confidence interval on U, cor- 
responding to a given u, N, and r, exceeds U, , reject H, . (Locate the point 
(u; Uo), where the coordinate u refers to the sample correlation and the 
coordinate U, to the population correlation in the graph giving a curve of 
the lower confidence limit. If the point is below the curve, reject the hy- 
pothesis. ) 

For a numerical illustration of this u test, see the example below. 


The Trivial Prediction 


The above results were obtained for any k > 1 and any r such that 
k — r > 1. They may be extended to the case k — r = 0, pertinent to the 
question whether it is worth while attempting a prediction with k predictor 
variables or whether it is better to use no predictor variable. 

This question looks strange but it is a real one. The regression sample 
gives some information on the distribution of x» . If one uses it for the trivial 
“prediction”: “the value of xo will be % ,” one may again find the expected 
value of the length of the confidence interval for x, and compare it with the 
expected length if some predictor variates are used. As could be expected 
there are cases where the trivial prediction is better. To give an example: 
for a sample size of 30 and one predictor variable, the correlation between 
the predictor variable and the variable to be predicted must be larger than 
.19 in absolute value for the precision of the prediction to be better than 
the precision of the trivial prediction [see (12)]. This extreme case, the 
question whether a prediction should be tried or not, is probably best suited 
to make the method understandable. 

If one wishes to predict a variable by another correlated one, it is not 
enough that the correlation between the two variables is significant. It 





48 PSYCHOMETRIKA 


must be larger in absolute value than some positive constant which depends 
on the sample size to ensure that the precision of the prediction is not worse 
than the precision of the trivial prediction. The improvement in precision 
may be judged by calculating B(I). 

To obtain better precision of prediction with k variables than with 
k — r variables, it is not enough that the multiple correlation coefficient 
increases if the further r variables are included in the analysis, but it must 
increase by more than a certain positive constant, the value of which depends 
mainly on the sample size. Again the calculation of B(J) will demonstrate the 
gain in precision. 

If Z , the sample mean of zp , is used to predict x1) one may determine 
confidence limits for z» by using 


a. 1/2 
(7) t= @ — ol toy | ; 


which has Student’s distribution with n — 1 d.f. The length of this interval 
is then 


i 1 1/2 
(8) I = 2t'ss( 14% £ 1) 


As nloo/Aoo has a x’ distribution with n — 1 d.f., the expectation of 1 
may easily be found. (Ago is the variance of x) . ) One has 


n 





205 in + r(3) (22) 


(9) earner = ==) 


An unbiased estimator of E(l) is now 


(10) E() 3 2(conloo)””» 


where ¢€,, is given by (3), which remains valid for k = 0. 
The statistic used for the test of the hypothesis E(l) > E,,,(1) is now 


(11) u=R, 


which is distributed like a multiple correlation coefficient obtained from a 
sample of size N = n out of a population with r independent variables, 
having a multiple correlation U = P. 

The hypothesis E(l) > E,,,(l) should be rejected if the point (u; Us), 
where 


(12) 





H. LINHART 49 


is below the corresponding curve in graphs of lower confidence limits for 
multiple P. 

It may be seen that all obtained results are straightforward extrapola- 
tions of those obtained for k — r > 0. 


Further Notes for Applications 


Strictly speaking, the test should only be applied if the hypotheses are 
formulated without using the data. If values of £(J) are computed from the 
regression sample in order to select a set of predictor variables, the test no 
longer is exact. For determining the better of two sets of variables, if both 
sets have been selected on the basis of the sample data, the distortion in 
the distribution of u is not expected to be great. 

If one set only has been chosen and the other set has been determined 
a priori, the test favors the chosen set. In most of these cases the chosen 
set will be the one with fewer variables and, therefore, more often than 
justified by the rationale of the test, the set with fewer variables will finally 
be used in the regression analysis. 

It is recommended that the hypotheses be formulated a priori, without 
using the regression sample, wherever that is possible. Then the method is 
exact. But even if that should not be possible, the test should be used. The 
deviations from the results which one would obtain if the exact procedure 
were used are, in the majority of cases, in a direction in which they do no 
harm. 

The criterion has been formulated such that the hypothesis E(l) > 
E,,,(2) must be rejected before the r further variables should be used. The 
criterion becomes stringent in the sense that for small differences in the 
values of H(l) and £;,,(l) the further variables are more often wrongly ex- 
cluded than wrongly included. The criterion could even be made more strin- 
gent by demanding that the hypothesis E(l)/E;,,(2) > a, where a is a con- 
stant, a < 1, must be rejected before the r variables can be used in the analy- 
sis. It may happen that a regression sample of size n is available and that 
one wants to estimate E(l) corresponding to a sample size n’ > n. For in- 
stance, a pilot study might be made in order to obtain a general impression 
of what could be attained and to decide on which variables should be used. 
Then all tests and estimators should be based on the sample size n’, which 
is ultimately to be used. 

For this case the formula for the unbiased estimator becomes 


n'\ (n—k—1 
(F)e( 2 





(13) 








PSYCHOMETRIKA 


The test of the hypothesis E(l) > E;,;()) (if a regression sample of size 
n is available for the computation of u according to (4)) becomes: reject 
the hypothesis if the point (u; U,.), where 


1/2 
(14) Up = (1 ‘i strat) 
€kn’ 


is below that curve giving lower confidence limits for multiple P which 
corresponds to a sample size N = n — k + rand tor independent variables. 

The ratio €,_,,,/€, used for the test of the hypothesis that E(l) > E;,,() 
is usually effected very little by the factor 


a ae 


The approximation 





(15) 


should therefore be used. For n > 20 this approximation is correct to two 
significant figures. 

If the question is whether or not a single variable should be used in the 
analysis (r = 1), tables of confidence limits for the product moment cor- 
relation coefficient may be used for the test, as the multiple correlation 
coefficient is, in this case, identical to the absolute value of the product 
moment correlation coefficient. 

Those tables [2, 7] give confidence intervals (upper and lower confidence 
limits) for the ordinary correlation coefficient. If the lower limits of those 
intervals are changed to zero whenever they are negative, they are lower 
confidence limits for the multiple correlation coefficient. These lower con- 
fidence limits for the multiple correlation coefficient belong to the same con- 
fidence coefficient as the corresponding confidence intervals for the product 
moment correlation coefficient. (The lower limit of the 95 percent con- 
fidence interval for the correlation coefficient is, e.g., the 95 percent lower 
confidence limit for the multiple correlation coefficient.) 


Summary of Instructions Regarding the Tests and Estimation 


To decide whether all k or certain k — r predictor variates should be 
used, one computes wu [given by (4)] and Wr/(n — k +r — 1). Then one 
finds, in graphs of lower confidence limits for multiple P, the curve which 
corresponds to r independent variables and a sample size of N = n — k +r. 

If the point (u; U,), where 


: ( r eS 
(16) hd, gms aS 











H. LINHART 51 


is above the curve, only k — r variables must be used. For the special case k — 
r = 0, wu is given by (11). Unbiased estimates for E(1) are given by (2), and 
for the case where k = 0, by (10). When a sample has a size different from 
that which one wants ultimately to employ, the decision procedure has to 
be modified as described above. The unbiased estimator of E(l) is given 
by (18). 


An Example of Application 


An engineering firm installing passenger and goods lifts wanted to im- 
prove their methods of selecting lift mechanic apprentices. A group of 28 
apprentices, who were already in the firm, were tested on a battery of psycho- 
logical tests; merit ratings were obtained from the supervisors. A regression 
analysis was carried out to find means for predicting merit ratings of later 
applicants. 

The standard deviation of the criterion variable is 45.39; the correla- 
tion matrix is given in Table 1. The variables are (1) a general intelligence 


TABLE 1 
Correlation Matrix 








2380 .336 .533 273 -043 086 
0197-558 .530 . ° e119 .050 

2581 .375 . . 2261 -.177 

2354 . 182 -. 087 

+137 -.178 

e151 =. 037 

--137 .2h1 

- 281 


1 
2 
3 
4 
5 
6 
7 
8 
9 





test, (2) a mechanical comprehension test, (3) an arithmetic test, (4) a tech- 
nical information test, (5) a form relations test, (6) a test of visual speed, 
(7) a two-hand coordination test, (8) the Pauli test, (9) the Bell adjustment 
inventory; criterion: merit ratings. 

It was not possible to order the variables a priori according to their 
predictive value. The problem was thus to select the most suitable tests. 
For this purpose the Wherry test selection method, with the square root 
method of reduction was used. This procedure has been described in detail 








enerea osn you op‘ *fet you *dfg 
*(226/T! Le¢*) *9Z=M‘OOT=.U 
*G*ezea osn ‘efor *dhg 
§(,96/t!Le%") ‘Lz=x‘ooT=.t cle" €z6* tee" lse- Stes its Sz6e- | **s 
*G°ava ogn you op‘ fez you *dig 612°- TIZ* OTE" S066" zee° sree 6tS* oz" 
£(,92/t! Len) ‘lzen‘ezeu 126° 196° 226° Te6* 96° 26° 986° 866° 
*g°azva ogn ‘for -dhy Tez o0O'T LET°- TST’ LET* zZet* 92 EIT’ 4° 
(g42/t! 49S) ‘@Z=N‘Qz=u °T 000°T 000°T 000°T 000°T 000°T 000°T 000°T 000°T 000°T 


ts, 
Fe. 


Tt, 





90° Toz*.  %To* €IT° tLz° 
"9S* Tez OOO*T LET*- TStT* LET* wet Tze EIT 
Lloe “9z° Ss St" m9C° Lez° 
%z0° svt* $lo° olz: SS¢° 


“ 
id 
a 
= 
& 
a 
iS) 
oo) 
4 
a 
MQ 
4 


z9T° 66¢° 944° BLT°= LET* O8zZ* Bze* OOO'T WSE* SlE* oOFse 
eZo° eSz° 6ST° z6¢° s6*%° 
TOO* Tzo° 420° 6ST° 90£° 
800° 820° 780° Lez° "GE 
TEO° 69T° éio* elz° zZo¢° 


A NN te OUT UO OE ODO NN 





Feg/tea 78a =Ffa/Ffa | F Fa | amg "3320 6 8 L 








SOTQVTIeA Jo WOTZoSeTSEG 
2 TTAVs 





H. LINHART 53 


by Dwyer ((4], pp. 19-26). This method (cf. also [9]) selects at first the 
variable having the highest correlation with the criterion, then that one of 
the remaining variables which yields the highest increase in multiple R, and 
so on. This procedure will usually render that group of k — r variables 
which has the highest multiple R as compared with all other groups of k — r 
variables. The increments in the square of multiple R can at each stage be 
read from the computing sheet. The tests described above can therefore be 
carried out at each stage of the selection. 

In Table 2, giving the computations, Dwyer’s notation has been used. 
The computing scheme is essentially that of the square root method. In 
addition the values of z;; and v;; , which are necessary for the selection of 
the jth variable, are computed. The increment in the square of multiple R, 
if the variable 7 were used in addition to the j — 1 variables selected before, 
is given by v3,/2;; . 

For the convenience of the reader, Dwyer’s formulas (13) and (27) 
([4], p. 17, p. 25) are repeated here (0 is the criterion variable, a is the first 
variable selected, b the second, ¢ the third, etc.). 


Soi = Tar» Sooo = (1 — 820)", Brive = (844 — Sai8as)/Bs0-0 » 
Seo-d ™ (1 — 820 — Brera)” 
Beivad = (Ses — Sai8ac — 84: -08be-0)/Bcerad 5 
4; = 1, nis = Tio 
Zoe = 214 — 8a5 , V2g = 1g — 8a:820 
234 = 225 — Sire , 035 = V2 — 85¢-a8d0-0 
fas = 235 — Seiad ’ Vas = V3¢ — Sci-ad8co-ad « 


The computations proceed on the following lines. 

. Write the correlation in the criterion column. Select as first variable, 
a, the variable having the largest correlation with the criterion. Apply 
the u test. Only if the null hypothesis.is rejected use variable a as pre- 
dictor and continue computations. 

. Fill in the correlations of all variables with variable a. Compute the rows 
8a; and 2; , then the columns »,,; and v};/2; . Select as next variable, b, the 
variable having the largest value of v};/z.; . Carry out the wu test. Use 
variable b (in addition to variable a) as predictor and continue computa- 
tions only if the null hypothesis is rejected. 

. Fill in the correlations of all variables with variable b. Compute the rows 
8y;.2 and 23; , then the columns »,; and v3;/zs; . Select as third variable, c, 
the variable having the largest value of v3,/z;; . Make the u test. Use 
variable ¢ (in addition to variables a and 6) as predictor and continue 
computations only if the null hypothesis is rejected. 











54 PSYCHOMETRIKA 


4. Carry on with the procedure until a wu test does not reject the null hy- 
pothesis or until all variables have been selected as predictors. 


In this example variable 8 is selected at first, as it has the largest cor- 
relation with the criterion. According to (11) one has u = .564 and according 
to (16) U. © V1/(28 — 1 +1 — 1) = .192. The point with coordinates 
(.564; .192) is below the curve corresponding to a sample of size N = n — 
k+r= 28 — 1+ 1 = 28 in Ezekiel’s Figure B ((5], p. 506) or, alterna- 
tively, the point (.564; .192) is below the curve giving the lower 95 percent 
confidence limits in David’s Chart II [2]. The null hypothesis is rejected, 
variable 8 should be used as a predictor. 

Variable 5 is selected next as it has the largest value (.162) in column 
v;,/22; . For carrying out the u test, again u, now given by (4) must be calcu- 
lated: R? — Ri,,; = v35/Z25 = .162;1 — R? = 1 — .564’ = .682; u = .487. 
According to (16), Up % V1/(28 —2+1—1) = .196, and the point 
(.487; .196) is above the curve corresponding to a sample size of N = 28 — 
2 + 1 = 27 in David’s Chart II [2]. The null hypothesis is not rejected, so 
variable 5 should not be used as an additional predictor. 

There is no disproof of the hypothesis that the variables 5 and 8 together 
will yield a worse prediction than the variable 8 alone, if a sample of 28 has 
been used to estimate the regression coeilicients. In the example there is a 
chance that the sample size may later be increased to n’ = 100. For n’ = 100 
variable 5 passes the test. 

According to (14) one has now Uy, & V1/(100 — 2+ 1 — 1) = .101; 
u remains as before, u = .487. The point (.487; .101) is below the curve for 
N = 27 in David’s [2] Chart II. The null hypothesis is rejected and in this 
case variable 5 should be used, together with variable 8, as predictor. 

Variable 4 is next to be selected because it has the largest value in 
column v3;/z;; , (.078). One has R? — R7,, = .078, 1 — R? = 1 — (.5647 + 
.162) = .520, u = .387, (see (4)), U. & VW1/(00 — 3 +1 — 1) = .102, 
(cf. (14)), and the point (.387; .102) is above the curve for N = 28 — 3 + 
1 = 26 in David’s Chart II [2]. The null hypothesis is not rejected, variable 
4 should not be used as an additional predictor. No further computations 














TABLE 3 
Bstimated Expected Lengths of Confidence Intervals 





Used variables no 8 8,5 8,5,4 





n= 26 193 166 151 145 
n'= 100 186 157 141 133 








H. LINHART 











UWA 











60T°2 $lo°z% £50°2 220 °2 TLO°% 020°2 St0°2 OTO°Z $00°2 000 °2 002 
oSt*z TOT*2 tTlo°% 9S0°2 270 °2 £z0°2 020°2 TO °z 400°2 000 °2 OST 
G8T°z N2T°2 980 °2 990°2 TS0°2 €£0°2 Sz0°z 9T0°2 800°2 000 °2 S2t 
Oe °z o9Tt*z TIt*? 980 °2 90°72 270 °2 TL0°2% TZ0°? OTO°Z 000 °z oot 
Whe °z €22°2 2St°2 6TT°S L80°2 LS0°2 20 °2 8z0°2 "TO°? 000 °2 SL 
609 °z TLE °Z S2Z°Z 68T°2% Let°2 880°2 $90°% £%0°2 TZ20°S% 000 °2 Os 
See°2 Los *z Gee °2 L%2°z LLT°z €Tt*z €80°2 US0°Z L£20°2 000°2 ov 
6S9°¢ ¢08°z 2e7°e 9S¢°2 TS2°2 LSt°2 WTT°2 4lO°z% 9£0°2 000 °2 o¢ 
rot 676 °2 NS9°Z £euez? 6S2°2 SeT°z STt°z 950°2 000 °Z 02 
Wy u 
02 aT ot 8 9 * ¢ z I 0 





"As pue “%» yo otqes 


4% ZTIAVL 








56 PSYCHOMETRIKA 


are necessary. One should thus finally use variables 5 and 8 in the analysis. 

The expected lengths of the confidence intervals were estimated using 
formulas (2), (8), and (11). They are listed in Table 3. The back solutions 
to obtain the estimated regression coefficients have not been carried out 
here, since it is planned to increase the sample size. 


Appendix 


As the author’s previous article [8] is not readily available, a referee 
suggested that a summary of the derivation of formula (1) be given here. 
Let L be the moment matrix in the regression sample, having elements 


li; =} Dd (ir — £,)(z;, — #;), 1,j= 0, 1, 2, coe  * 
L, the matrix with elements /;; , 7,7 = 1, 2, ---,k, and Zand L, the correspond- 
ifg determinants. If the predictor set is denoted by y = {y, , y2, °** , Ye} 
and the vector of sample means by = {#%, , % , --- , %}, the length of 
the 95 percent confidence interval for x, may be written in the form 


- (n—k—-1) -1/2 L e? 1/2 
(18) L = 2s" — k — 1) (4) (M+14+7), 
0. 


rT 
where 


(19) T = (y — 2)’Ly'(y — 2) 


is Hotelling’s generalized 7, ([1], eq. 37.3.5, p. 554). 

The expectation of / has to be taken over the variables J,; , having 
Wishart’s distribution, over the y; and Z,; , both having multivariate normal 
distributions. All three groups of variables are distributed independently 
of each other. 

If f(loo , lor , *** » tex) is the density of the elements of L and 
fli: , li2, +++ , Le) is the density of the elements of L, , Wilks ((10], eq. 21, 


p. 481) has shown that 


1/2 
(20) / (4) f(loo y lor » ***y Vax) Aloo dln, +++ dlox 


Lo 
n—k 
r( 5+) 





1/2 
(74) flu ’ hie "alice" lea). 


One has therefore 











H. LINHART 57 


(21) x (E) "en ee" 7” | 


(5) 








= (74) "zt + 1 + oy". 
r(” —k-—- 1) mA 
2 
As was shown by Hsu [6], 7 has, for fixed y; , the noncentral F distribution 
C) DL (as ada 
(22) (T) =e D Lar 
wists w(t ne a ~~ = n—t) 
. 
where 
and A, is the moment matrix in the population of x, , 7; , --- , X . The con- 


ditional expectation of (n + 1 + T)'”, given y, is then 


[o+i4 mmm ar. 


G+) 
@ 


where the expectation is taken over the variables y; and the over-all ex- 
pectation of (n + 1 + T)'” is therefore 


—k/2 
El(n + 1 + T)'”| ra (n + 1) f pe Xn 4. 1 + - 9 cle & ae T —n/2 


r(" 2 *)r() 
E(B Es) Etta 


(25) 2 +1/\14+T v! 
Qs) 
ec wee 


This, together with (18) and (21), proves (1). 


One may show that 





(24) E[Ne™] = y (n +1), 


(2. 

















: 
i 
i 
h 





PSYCHOMETRIKA 


REFERENCES 


[1] Cramér, H. Mathematical methods of statistics. Princeton: Princeton Univ. Press, 1946. 
[2] David, F. N. Tables of the ordinates and probability integral of the distribution of the 
correlation coefficient in small samples. London: Biometrika Office, 1938. 
[3] Dwyer, P. S. Linear computations. New York: Wiley, 1951. 
[4] Dwyer, P. S. The relative efficacy and economy of various test selection methods. 
Dept. of the Army, U.S.A.G.O., Personnel Res. Sec. Rep. No. 957, 1952. 
[5] Ezekiel, M. Methods of correlation analysis. New York: Wiley, 1941. 
[6] Hsu, P. L. Notes on Hotelling’s generalized 7. Ann. math. Siatist., 1938, 9, 231-243. 
[7] Linhart, H. Confidence limits for the coefficient of product moment correlation in 
bivariate normal populations. J. nat. Inst. Personnel Res., 1956, 6, 153-174. 
[8] Linhart, H. Critére de sélection pour le choix des variables dans |’analyse de régression. 
Revue suisse d’Economie politique et de Statistique, 1958, 94, 202-232. 
[9] Summerfield, A. and Lubin, A. A square root method of selecting a minimum set 
of variables in multiple regression: I. The method. Psychometrika, 1951, 16, 271-284. 
[10] Wilks, S. S. Certain generalizations in the analysis of variance. Biometrika, 1932, 
24, 471-494. 


Manuscript received 2/ 14/ 59 
Revised manuscript received 6/10/59 





PSYCHOMETRIKA—VOL, 25, NO. 1 
MARCH, 1960 


SOME MULTIPLE CORRELATION 
AND PREDICTOR SELECTION METHODS 


Harry E. ANDERSON, JR. 


SYSTEM DEVELOPMENT CORPORATION 
AND 


BENJAMIN FRUCHTER 


THE UNIVERSITY OF TEXAS 


The Doolittle, Wherry-Doolittle, and Summerfield-Lubin methods of 
multiple correlation are compared theoretically as well as by an application 
in which a set of predictors is selected. Wherry’s method and the Summerfield- 
Lubin method are shown to be equivalent; the relationship of these methods 
to the Doolittle method is indicated. The Summerfield-Lubin method, because 
of its compactness and ease of computation, and because of the meaningful- 
ness of the interim computational values, is recommended as a convenient 
least squares method of multiple correlation and predictor selection. 


The present study is concerned with a comparison of three methods 
of multiple correlational analysis: the Doolittle method, the Wherry-Doolittle 
method, and the Summerfield-Lubin method. The Doolittle method [3], 
developed in the late nineteenth century, is presented in complete form by 
Dwyer [6]. Wherry’s method [25] appeared in the early thirties as a technique 
for selecting a few, adequate predictors from the available pool of predictors, 
adequate in the sense that the relationship obtained between the criterion 
and the selected predictors would not be significantly increased by the 
inclusion of additional predictors. The Summerfield-Lubin method [19, 22] is 
based upon the square-root method of multiple correlation [7, 8, 15). 

The third point in Roff’s article ((20], p. 5) on multiple factor theory 
was an important anticipation of the present work: ‘The square of the 
multiple correlation of test j7 with the n — 1 remaining tests equals the 
communality of test j if the group of tests contains r statistically independent 
tests each with a communality of unity.”’ The square-root method of multiple 
correlation is thus seen to be related to the diagonal method of factoring 
({11], pp. 52-59). 

Summerfield and Lubin, though their presentation is complete in other 
respects, never point out that their method is computationally equivalent 
to Wherry’s method. The present article is designed to show the relationships 
among the three methods in terms of their use in selecting an adequate 
predictor-set from a larger pool of predictors. 


59 











60 PSYCHOMETRIKA 


The methods to be considered represent two distinct approaches to the 
multiple correlation problem. In a vectorial representation of the Doolittle 
method, the criterion’s vector is projected onto the plane, or hyperplane, 
spanned by the predictors; appropriate weights can then be developed for the 
predictors as the oblique coordinates along the predictors’ vectors. Most 
commonly, those predictors that reflect high, positive weights are selected. 
The same projection of the criterion’s vector is implied by the other two 
methods of multiple correlation. The predictor vectors are used sequentially 
in the analysis, however, so that each reflects the highest relationship with the 
criterion at a given stage of selection. The consequent relation of a predictor 
to the criterion is indicated, not by the oblique coordinates, but by the 
orthogonal coordinates, since each predictor in turn is rotated to a position 
orthogonal to all previously selected predictors. Despite some procedural 
differences the two approaches have the purpose of determining the portion 
of the criterion’s variance that is correlated with a composite of the predictors’ 
variances, in a least squares sense, as implied by the concept of projection. 
A comparison of the three methods will be made using data from a predictive 
study of USAF radiotelegrapher trainees.* 


Method 
The Data 


The data for the analyses reported herein were taken from an Air Force 
study of 310 radiotelegrapher students. The criterion variable in the study 
was the number of days training required for a trainee to learn to receive 
14-MGPM (i.e., 14 multiple groups per minute) in Morse code. The fourteen 
variables used for the prediction of the criterion are as follows. 


1. Rhythm 4. Given 30 pairs of rhythmic patterns consisting of dots and dashes, 
identify which pairs are the same and which are different. 

2. Four-Letter Words. Identify all four-letter words in an unbroken series of letters. 

3. Mutilated Words. Given five choices for each of 25 items, select the “‘clear’’ word 
which is closest in meaning to the given mutilated word. 

4. Dot Perception. In 50 five-signal groups, containing rapid patterns of dots and 
dashes, identify the number of dots in éach group. 

5. Code Distraction. This test presents the same task as Dot Perception except that 
additional irrelevant signals are presented in the background. 

6. Copying Behind. Groups of 15 digits, one through five, are presented in order of 
increased celerity; identify each digit oh a standard IBM answer sheet. 

7. Hidden Tunes. Determine whether a presented tune is included in a longer second 
tune. 

8. Army Radio Code Test (ARCT). A measure of the speed with which an examinee 
can learn code characteristics. 

9. Radio Operator Biographical Information (Radio Operator B I). Personal history 
items related to radio operator activities. 


*The study was completed in connection with Contract AF 41(657)-109. Appreciation 
is expressed to Dr. Edwin A. Fleishman for the basic data used in the example. 














HARRY E. ANDERSON, JR. AND BENJAMIN FRUCHTER 61 


10. Word Knowledge. A multiple-choice vocabulary test. 

11. Arithmetic Reasoning. A series of verbally stated problems requiring arithmetic 
operations. 

12. Dial and Table Reading: Requires reading, identifying, and interpolating various 
dial positions and complex information in tables. 

13. Numerical Operations. Requires the solution of simple numerical problems. 

14. Memory for Landmarks. A paired-association test requiring geographical features 
to be matched with appropriate names. 


More complete descriptions of the tests can be found in Fleishman [9] and 
Guilford [12]. 


The Analyses 


Pearson product moment correlations were obtained among all 15 
variables used in the study. The correlations were corrected for restriction 
in range [13], since the procedures used for selection of radiotelegrapher’s 
school trainees tended to restrict the variability of the sample as compared 
with unselected samples. The signs of the correlations of tests with the criterion 
were reflected because the original negative coefficients were interpreted as 
indicating a positive relationship; i.e., in general, the better a given trainee’s 
score on a given predictor, the lesser number of days training required for 
that trainee to attain the criterion. 


TABLE 1 


Intercorrelations of the Predictor Variables and the Criterion Variable 
(All entries to three decimal places) 














Variable 
Variable 1 2 3 4 5 6 7 8 9 10 ll 12 13 4 #15 
1. Rhythm 4 
2. Four-Letter Words 081 
3. Mutilated Words 116 «463 
4. Dot Perception 340 178 227 
5. Code Distraction 300 116 130 823 
6. Copying Behind 261 208 320 457 447 
7. Hidden Tunes 404 160 212 432 420 319 
8. ARCT 177, 097-163) 4700 «4260 «462215 
9. Radio Operator B 1 022 «#197 298 134 090 225 126 182 
10. Word Knowledge 073° «30 «64434 «6278 «= 254) 337 205) 265 410 


ll. Arithmetic Reasoning 067 243 355 294 254 448 158 386 394 554 

12. Dial and Table Reading 072 353 418 314 288 526 248 433 394 517 645 

13. Numerical Operations 080 277 387 193 222 441 206 266 253 442 560 590 

14. Memory for Landmarks 013 «3140 «3303's 221 2S 190 §=6332) 210 292 336) 389 446 498 355 


15. 14-MGPM 211 «194 298 389 353 395 212 273 161 270 245 232 308 178 








PSYCHOMETRIKA 


Results 


The intercorrelations among the 15 variables are shown in Table 1. 
The analyses in the present study were computed from these values. 


The Doolittle Analysis 

Four multiple correlations were obtained using the Doolittle method. 
The validities, beta weights, and multiple correlation values are presented 
in Table 2. 


TABLE 2 


The Validities and the Corresponding Beta Weights 
For the Variables Involved in Each of Four Multiple F 
Correlations with the 14 MGPM Criterion Using the Doolittle Method 
(Decimal points have been omitted) 








Beta weights 
Variable Validity Il Ill 








Rhythm 4 211 
Four-Letter Words 194 

- Mutilated Words 298 1691 
Dot Perception 389 2463 
Code Distraction 353 
Copying Behind 395 

- Hidden Tunes 212 

- ARCT 273 
Radio Operator B I 161 
Word Knowledge 270 

- Arithmetic Reasoning 245 
Diel and Table Reading 232 
Numerical Operations 308 


- Memory for Landmarks 178 


2 
R 





The first multiple correlation includes all 14 predictors; in this multiple 
correlation 26.99 percent of the criterion’s variance is predictable from its 
relationships with the 14 predictors. The 4 variables with the highest positive 
beta weights are Copying Behind, Dot Perception, Numerical Operations, 
and Mutilated Words. These 4 variables, when used in the second multiple 
correlation, accounted for 24.84 percent of the criterion’s variance, which 
indicates that elimination of the remaining 10 predictors resulted in a loss 





HARRY E. ANDERSON, JR. AND BENJAMIN FRUCHTER 63 


of merely 2.15 percent of the criterion’s predictable variance. The F test 
({13], p. 400), 


(Ri — Rs)(N — m, — 1) 
F = df, 
(1) (1 cy ie Ri)(m, Ft Me) : 

df, = N a Sian 1, 


testing the difference between the first and second multiple correlation, 
yields a value of .8687 which, for 10 and 295 degrees of freedom, is clearly 
not significant at the .05 level; in (1), R, is the multiple correlation with the 
set of m, predictor variables, R, is the multiple correlation with m. < m, 
of the same predictor variables, N is sample size, df, is the degrees of freedom 
for the numerator, and df, is the degrees of freedom for the denominator. 
The further elimination of Mutilated Words and Numerical Operations, 
which have the lowest beta weights in the second multiple correlation, 
leaves a multiple correlation of .4593 (not shown in Table 2), which, by 
the application of (1), differs from the first multiple correlation at the .05 
level (F = 1.9832, df, = 12, and df, = 295) and suggests that at least one 
of these predictors should be retained in the final set of selected predictors. 
The third and fourth multiple correlations, respectively, provide for the 
elimination of Numerical Operations and Mutilated Words. Neither the third 
nor fourth multipie correlation is significantly different from the first multiple 
correlation (i.e., for 11 and 295 degrees of freedom, respectively, F = 1.2342 
and F = 1.3224), but the third multiple, being the higher of the two, will 





=m —- mM, 


be used to represent the final set of predictor variables selected in the Doolittle 
analysis. This set of predictor variables consists of Copying Behind, Dot 
Perception, and Mutilated Words. 


The Wherry-Doolittle Analysis 


The Wherry-Doolittle analysis begins with the selection of the most 
valid predictor; other predictors are added, one at a time, until the multiple 
correlation, corrected for shrinkage [25], exhibits no further appreciable 
increment. Tables 3, 4, and 5 present the interim calculations necessary for 
carrying out the Wherry-Doolittle analysis, while Table 6 presents the 
multiple correlation computations and values. There is a column for each 
of the 14 predictors in Tables 4 and 5; Table 3 also includes a column for 
variable 15, the criterion. Unity is placed in each column of row Z, in Table 
4 to represent the total variance in each of the variables. Dashes used through- 
out row a, in Table 3 represent ‘‘no entry.” The validity coefficient of each 
predictor is placed in its respective column of row V, in Table 5. 

The Copying Behind test has the highest validity and, therefore, is the 
first predictor selected in the Wherry-Doolittle analysis. The values in row 
c, of Table 3 are the result of dividing the respective elements 0, ,; 
(@@ = 1,2, +++ , 15) by — big. 





64 PSYCHOMETRIKA 


TABLE 3 


Interim Calculations for the Wherry-Doolittle Method 
(All entries to three decimal places. Asterisks indicate variables selected for the predictor-variable set.) 








Variable 








457 
-208 +320 457 


083 4081 791 


-105 


463 1.000 227 
388 «6890 —* 


-436-1.000 * 





The values for Tables 4 and 5 are now computed from the values in 
Table 3. The values for row Z, in Table 4 are, for each column respectively, 
Zo = Zi, +b1,,¢:, @ = 1,2, «++ , 14). The values for row V, in Table 5, 
for each column respectively, are computed as V,,, = Vi,; + Diss C1, 
(= 2, °**, 14). 

The next variable selected is that predictor for which V2/Z, is largest. 
The variable with the largest such ratio is variable 4, Dot Perception, with 
V;/Z, equal to .0547. The value .0547 represents the amount of variance 
that Dot Perception adds to the prediction of the criterion variable in con- 
junction with and selected after the Copying Behind test. The “reduced” 
multiple correlation, R, corrected for shrinkage [cf. 25], as shown in Table 
6, increased by more than .04 with the addition of Dot Perception, suggesting 
that the selection process should be continued to determine whether additional 


TABLE 4 


Interim Calculations for the Wherry-Doolittle Method 
(All entries to three decimal places. Asterisks indicate variables selected for the predictor-variable set.) 





Variable 





2 3 & 5 6 7 8 9 10 





1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 
957 698 791 800 * 898 787 949 886 
948 890 * 315 * 794 702 948 867 
779 * * 308 * 787 702 892 756 








HARRY E. ANDERSON, JR. AND BENJAMIN FRUCHTER 


TABLE 5 


Interim Calculations for the Wherry-Doolittle Method 
(All entries for three decimal places. Asterisks indicate variables selected for the predictor-variable set.) 








Variable 
8 











variables will contribute something more to the prediction of the criterion. 

The remaining rows in Tables 3, 4, and 5 are computed similarly and 
further predictors are selected in the order shown; these variables are, re- 
spectively, Mutilated Words and Numerical Operations. The fourth selected 
predictor, Numerical Operations, adds less than .01 to the value of the reduced 
squared multiple correlation. The decision to retain or eliminate a predictor 
is to some extent a judgmental matter where its contribution to the reduced 
squared multiple correlation is small. Although some writers (e.g., [21], p. 250) 
recommend the retention of any predictor that increases the shrunken 
multiple correlation, it is the opinion of the present writers that if its contri- 
bution to R’ is as small as .01 or less, there is little value in retaining the 
predictor. The predictors selected in the Wherry-Doolittle analysis, then, 
are as follows: Copying Behind, Dot Perception, and Mutilated Words. 


The Summerfield-Lubin Analysis 

The Summerfield-Lubin analysis begins with a multiple correlation 
predicting the criterion from all of the predictors. The multiple correlation 
is developed using the square-root method of factoring and the procedures 


TABLE 6 


Calculations for the Multiple Correlational 
Values in the Wherry-Dodlittle Method 








No. of 
Predictors N-m 





LY) ° (N=310) 
1.0000 
1.0032 
1.0065 
1.0098 





*Predictor 13, rical Operations, was_sejected from the set of selected predictors 
since it added less than .01 to the value of R'. 





66 PSYCHOMETRIKA 


outlined by Summerfield and Lubin [19, 22]; the resulting matrix is shown 
in Table 7. The test of significance for the multiple correlation using all 
predictors (i.e., the first F test in Table 8) indicates that this value is sig- 
nificant at the .01 level. 

In obtaining an adequate set of predictors by the Summerfield-Lubin 
method the most valid predictor, the Copying Behind test, is selected first. 
The first-order semi-partial correlations [see 2] resulting from the insertion 
of each of the other 13 predictors into the factor space after the Copying 
Behind test are shown in Table 9. The Dot Perception test has the highest 
semi-partial correlation and, consequently, is the second predictor selected. 
The second F test in Table 8 indicates that Dot Perception contributes signifi- 
cantly to the multiple correlation problem in conjunction with the Copying 
Behind test. 

The semi-partial correlation differs from the partial correlation only 
in terms of the denominator; the former uses the square root of the co- 
efficient of nondetermination while the latter requires the use of the geometric 
mean of two error variances. DuBois [4] refers to the former values as part 
correlations. 

Tables 10 and 11, respectively, contain the second- and third-order 
semi-partial correlation values; Mutilated Words and Numerical Operations 


Table 7 
The Square-root Matrix, Tjj, and the Cumulated R715.1,2,..14 for the Initial 
Multiple Correlation in the Square-root Analysis 
(All entries to four decimal places) 








Variable 


Variable 





7 8 





1 


2 


3 


4 


5 


6 


7 


8 


9 


10. 
ll. 
12. 


13. 


14 


15. 


Cumulated R715.1,2, 2,16 


- Rhythm 4 

- Four-Letter Words 
+ Mutilated Words 

+ Dot Perception 

+ Code Distraction 


+ Copying Behind 


- Hidden Tunes 0000 


+ ARCT 0000 


- Radio Operator B I 0000 


Word Knowledge 0000 


Arithmetic Reasoning 7530 


Dial & Table Reading 2590 


Numerical Operations 2772 


- Memory for Landmarks 1580 
14-MGPM 0484 -0182 


2468 2472 








HARRY E. ANDERSON, JR. AND BENJAMIN FRUCHTER 


TABLE 8 


Tests of Significance for Multiple Correlations 
Obtained in the Square-root Analysis 





Degrees 
Null Source of of Sums of Mean 
hypothesis variation freedom squares squares 


2 p 2 
RS Ol aber Red, see 916 14 +2701 +0193 
Residual .7299 .0025 

Total 1.0000 





2 
Rg .1560 
2 2 
Re .6,47Rc.6 0549 


Residual +7891 
Total 1.0000 


2 2 2 
Re .6,47 8c .6,4,3 RC .6,4 -2109 
2 o 
Re .6,4,3 °C 6,4 
Residual 

Total 


2 ee 2 
Re 6,464,312 %c.6,4,3 Re .6,4,3 


2 2 
Ro .6,4,3,1278e.6,4,3 
Residual 
Total 


2 age 2 
Re 6,453 %c.1,2,.-,14 Re .6,4,3 
2 


oR2 
Oe oe NOE Fe op 
Residual 
Total 





*Significant at the .01 level. 
**Significant at the .05 level. 
*eeNot significant at the .05 level. 


TABLE 9 


Interim Calculations for Selecting the Second Predictor Variable in the Square-root Analysis 
(All entries to four decimal places) 








Calculations* 





Variable 





1. Rhythm 4 2610 
2. Four-Letter Words 2080 
3. Mutilated Words 3200 
4. Dot Perception 4570 
5. Code Distraction 4470 
6. Copying Behind -0000 

+ Hidden Tunes 3190 
8. ARCT 4620 
9. Radio Operator B I 2250 
10. Word Knowledge 3370 
11. Arithmetic Reasoning 4480 
12, Dial & Table Reading 5260 
13. Numerical Operations 4410 
14. Memory for Landmarks 3320 
15. 14-MGPM 3950 


pt RS Leap Ea: PR RM GO, AOR aM UR AD | 





*A = Fog C™ Fs 4.6) Bm F108 16% 6c 


2 e « 
B= er? D= ri F = 5001.6) 








68 PSYCHOMETRIKA 


TABLE 10 


Interim Calculations for Selecting the Third Predictor 
Variable in the Square-root Analysis 
(All entries to four decimal places) 




















Calculations* 

Variable A B c D E F 
1. Rhythm 4 3400 2481 8703 0497 9329 0533 
2. Four-Letter Words 1780 0932 9480 0900 9737 0924 
3. Mutilated Words 2270 0908 8894 1513 9431 1604 
4. Dot Perception 1.0000 8895 0000 0000 0000 0000 
5. Code Distraction 8230 6956 3163 0134 5624 0238 
6. Copying Behind 4570 0000 0000 0000 0000 0000 
7. Hidden Tunes 4320 3218 7946 0106 8914 0119 
8. ARCT 4700 2911 7019 0223 8278 0266 
9. Radio Operator B I 1340 0351 9482 0639 9738 0656 
10. Word Knowledge 2780 1394 8670 1042 9311 1119 
11. Arithmetic Reasoning 2940 1004 7892 0445 8884 0501 
12. Dial & Table Reading 3140 0827 7165 0048 8465 0057 
13. Numerical Operations 1930 0096 8054 1361 8974 1517 
14. Memory for Landmarks 2210 0779 8837 0286 9401 0304 
15. 14<MGPM 3890 2344 7891 7891 8883 8883 
“Arr c= xr? er? ger. 

14 1(1.6)"7i(4.6) i(4.6,4) 
B= Ti (4,6) D™ (ryt 6h 6c) “Fe(4.6)51(4,6,4) F = 5.(1.4,6) 


were selected, in that order, as a result of these computations. The reader 
can see, however, from the tests of significance in Table 8 that inclusion of 
Numerical Operations adds little to the correlational relationship and that the 
multiple correlation predicting the criterion from Copying Behind, Dot 
Perception, and Mutilated Words does not differ significantly from the one 
involving all 14 predictors. The latter 3 predictors, therefore, are sufficient 
according to the principles outlined by Summerfield and Lubin. 


TABLE 11 


Interim Calculations for Selecting the Fourth Predictor 
Variable in the Square-root Analysis 
(All entries to four decimal places) 

















Calculations* 

Variable A B c D E F 
1. Rhythm 4 1160 0106 8702 0480 9328 0515 
2. Four-Letter Words 4630 4114 7788 0244 8825 0276 
3. Mutilated Words 1. 9431 0000 0000 0000 0000 
4. Dot Perception 2270 0000 0000 0000 0000 
5. Code Distraction 1300 -0762 3105 0255 5572 0458 
6. Copying Behind 3200 0000 0000 0000 0000 
7. Hidden Tunes 2120 0858 7872 -0031 8872 -0035 
8. ARCT 1630 -0121 7018 0242 8377 0289 
9. Radio Operator B I 2980 2362 8924 0262 9447 0277 
10, Word Knowledge 4340 3324 7565 0512 8693 05389 
1l. Arithmetic Reasoning 3550 2147 7431 0103 8620 0119 
12. Dial & Table Reading 4180 2568 6506 0361 8066 0448 
13. Numerical Operations 3370 2617 7369 0944 8584 1100 
14. Memory for Landmarks 3030 2011 8433 0035 9183 0038 
15. 14-MGPM 2980 1594 7637 7637 8739 8739 
At; D™ | Cepet eke) “Fe(m.6)4¢6.6) | 7¥e(3.46)"1(3..66) 
BT 6 3.6,6) FF 604.3,6,6) 


or a | s 
© Fs 4.6,4)771(3.46) F” 0(4.6,6,3) 














HARRY E. ANDERSON, JR. AND BENJAMIN FRUCHTER 69 


Discussion 
The Predictor Sets 


The set of predictors selected from the entire group of 14 predictors 
was the same in each of the three analyses. Some argument existed for con- 
tinuing the Wherry-Doolittle analysis to include the Numerical Operations 
test because it contributed to the obtained relationship, but the additional 
contribution was so small as to be judged negligible. Moreover, the results 
of the F tests in the other two analyses indicated that the contribution of 
this variable was not significant at the .05 level. Summerfield and Lubin 
((22], pp. 278 and 282) point out that the F test is not an exact test in this 
situation and should be used as a decision tool rather than as a strict test 
of significance. The present writers suggest a similar use for the reduced 
squared multiple correlation value in the Wherry-Doolittle analysis. 


The Doolittle and Wherry-Doolittle Analyses 


The Doolittle method of multiple correlation merely represents a con- 
venient least squares solution for the multiple correlation problem. The 
simultaneous equations involved for a four-variable problem, for instance, 
are as follows: 


B, + Boti2 + Batis = Tre 5 
(2) Biro, + Bo + Boos = Tre ; 
Bits: + Bots2 + Bs = Tae - 
The squared multiple correlation is, then, 
(3) Riza = Bitie + Bat2e + Bat'se - 


The scheme for computing the Doolittle analysis is shown in Figure la 
for a four-variable problem. The rows r;; (¢ = 1, 2, 3; 7 = 1, 2, 3, c) may 
each be viewed as beginning a new group of operations, each group of oper- 
ations ending with the rows z,; (¢ = 1, 2, 3; 7 = 1, 2, 3, c). (The conventional 
process, e.g., [13], pp. 406-409, has been to divide the z;, elements by the 
negative value of the element immediately above z,; producing, generally, 
negative values throughout z,;. ; these negative values, however, are for 
computational convenience and are not a necessary part of the theory. 
For our purposes the z;; values will be considered positive.) The “forward” 
solution provides the z;; values, while the “back”’ solution uses the z;; values 
for determining the 6; (¢ = 1, 2, 3) weights. 


Bs = 23 ? 
(4) Bo = Za¢ — 22383 , 


Bi = 21. — 21282 — 21383 « 





PSYCHOMETRIKA 


























Zy 





Figure la Figure 1b 
The Computational Format for a Four- The z;; Elements from Figure la 
Variable Doolittle Analysis 


Figure 1b presents the matrix of z;; values extracted from the computational 
format in Figure 1a; reference to it will be made later on in the discussion. 

The Wherry-Doolittle method is based on the computations involved 
in the Doolittle analysis except that partial regression weights are not com- 
puted, though they may be obtained after the selection procedure is completed; 
the result at each step, then, is a value which is the product of a validity and 
a weight. Because of the modified operations, principles involved in the 
back solution of the Doolittle method are used first with the original correl- 
ation coefficients. The elements used in the back solution (i.e., the 2;; values), 
however, were obtained by dividing the immediately preceding rows by the 
leading coefficient in those rows; this part of the forward solution is used 
after the appropriate parts of the back solution have been completed. Another 
important aspect of the Wherry-Doolittle method is that since the partial 
regression weights are not computed, each contribution is made up of a 
result times itself so that the numerator in the Wherry-Doolittle analysis 
may be squared before it is divided by the appropriate part of the forward 
solution. The Wherry-Doolittle method, then, since it is based on the Doolittle 
method, represents a least squares solution to the multiple correlation 
problem. In addition, a correction is applied to the multiple correlation 
to yield a “shrunken” multiple correlation (R), corrected for the chance 
error added by each predictor [25]. 


The Doolittle and the Summerfield-Lubin Analyses 


The Summerfield-Lubin method, based on the square-root method of 
multiple correlation, requires that the system of predictors be transformed 





HARRY EB. ANDERSON, JR. AND BENJAMIN FRUCHTER 71 


into an orthogonal structure; the criterion is inserted last in the matrix so 
that the multiple correlation for predicting the criterion is obtained by 
squaring the projections of the criterion on the orthogonal vectors of the 
predictors. The triangular matrix obtained by the square-root method, for a 
four-variable problem, is presented in Figure 2. It is easily shown that the 





ty; 





ta, 





tay 

















tes 





Figure 2 
The Triangular Matrix, T, for a Four-Variable Square-Root Analysis 


matrix 7 in Figure 2 differs from the transpose of the matrix Z in Figure 1b 
only in the diagonal elements, so that dividing each column in 7 by its 
diagonal element, ¢,;; , yields 

1 
(5) -F = Z’, 
The appropriate least squares beta weights, as Lubin and Summerfield 
show, are then computable from the matrix 7’ as well as from the matrix Z. 
The relationship in (5) provides a simple bridge between least squares theory 
and orthogonal transformations. Any values obtainable by the square-root 
method are obtainable by the Doolittle method, and conversely. The reader 
can note, for instance, the numerical equivalence of the R’ obtained in the 
first multiple correlation in Table 2 and the cumulated R’ in Table 7, the 
difference of .0002 being due to rounding error. 

Many beta weights, in fact, can be computed indirectly from the com- 
pleted triangular matrix in the square-root analysis. [For instance, to complete 
the matrix in Figure 2, merely compute the value ¢,, = 1 — (#, + &, + é&).] 
If the intercorrelation matrix, R, is factored into the square-root matrix, 7’, 


(6) R = TT’, 
then, by definition, 
(7) Ro = (T")(T"). 


The computation of the inverse of a triangular matrix is much simpler and 
more straightforward than the computation for a square symmetric matrix; 











72 PSYCHOMETRIKA 


the complete procedure was presented by Anderson and Fruchter [1], although 
the theory, together with other uses of the square-root matrix, has been 
presented in several previous works [e.g., 5, 10]. The inverse matrix has some 
interesting properties, as is well known, such that r** (the ith diagonal 
element in the inverse matrix) is the reciprocal of the coefficient of non- 
determination when predicting the variable 7 from the n — 1 remaining 
variables in R; then 


i _ ws 
(8) r'* wa K; ? 
where 


in which R% is the squared multiple correlation predicting the variable 7 
and Kj is the corresponding coefficient of nondetermination. Also, if the 
inverse element in the row i and column j, r‘’, is divided by the negative 
value of its diagonal element, then, 


(10) rl/—r" = Ba 


where §,; is the jth beta weight for predicting the 7th variable, and these 
beta weights may then be used for obtaining the multiple correlations for 
predicting each variable from the n — 1 remaining variables in the matrix. 
The beta weights computed by (10) are the same as those that would be com- 
puted in an appropriate Doolittle analysis. 

The triangular matrix may also be used to establish simultaneously the 
multiple correlational relationship of a set of predictor variables to a number 
of criterion variables. DuBois ([4], pp. 72-73) covers this point very well. 
Walker and Lev ([24], pp. 331-339) describe the procedure in terms of matrix 
operations such that 


(11) f Rp = C, 
and 
(12) B= R'C, 


where F is the intercorrelation matrix of predictor variables, 8 is the column 
vector matrix of beta weights, and C is the column vector of correlations of 
each predictor with each criterion variable. The matrix R~’ can be easily 
computed from (7), and beta weights can be obtained by inserting the 
appropriate criterion-predictor validity correlation coefficients in the column 
vector matrix C. Both the Doolittle and square-root methods, however, 
obviate the necessity for computation of R~’. In using the Doolittle method, 
one does the operations in Figure 1, with the appropriate column values for 











HARRY E. ANDERSON, JR. AND BENJAMIN FRUCHTER 73 


c, separately for each of the criterion variables. Likewise for the square-root 
method, which is in a more compact computational form, one computes 
the matrix 7, as in Figure 2, for the predictors, and adds each criterion, c, 
sequentially to 7; the criterion variables are being projected one at a time 
onto the space spanned by the predictor variables. 


The Summerfield-Lubin and the Wherry-Doolitile Analyses 


Summerfield and Lubin ([22], p. 272) list three advantages of their 
method over the Wherry-Doolittle method. The first advantage deals with 
the computation of the multiple correlation using all predictors to justify 
the subsequent selection process; the second concerns the criteria for retaining 
or rejecting a selected predictor in the predictor-set. The third advantage is 
said to be that their computations are based on those in Dwyer’s square-root 
method of multiple correlation so that they ‘--- are more compact, the 
coefficients easier to interpret, and the calculations fewer, than for Wherry’s 
method” ((22], p. 272). The third advantage listed by Summerfield and 
Lubin is not important inasmuch as the procedure and the attending compu- 
tations in the two methods are almost identical; this is evidenced by the 
theoretical relationships indicated previously, and the interim computational 
tables obtained herein for the two methods. The following identities are 
observable between the columns in the tables used for the Summerfield- 
Lubin analysis, and the rows in the tables used for the Wherry-Doolittle 
analysis. 


The Summerfield-Lubin The Wherry-Doolittle 
Analysis Analysis 
Table Column Table Row 
9 A 3 bi 
9 B 4 Zs 
9 D 5 Vi 
9 E 5 V2 
10 A 3 a2 
10 Cc 4 Zs 
10 D 5 V3 
11 A 3 as 
11 C 4 Zs 
1l D 5 Vi 











PSYCHOMETRIKA 


The final columns in Tables 9, 10, and 11 of the Summerfield-Lubin 
method, as was indicated by the theoretical relationships, contain values 
which are the square roots of the V{/Z; ratios obtained in the Wherry- 
Doolittle analysis, so that while the Summerfield-Lubin method requires 
that the square root of the denominator be taken in determining R’, the 
Wherry-Doolittle method requires that the numerator be squared for de- 
termining that value; the latter operation might be more practical, especially 
if the numbers involved are of considerable magnitude, and it is the squared 
values that are used directly in the multiple correlation. 

Summerfield and Lubin state that the Wherry-Doolittle method involves 
the computation of partial correlations ((22], p. 271), and their reference 
((23], p. 202) makes the same statement. The above comparison indicates 
that it is not the partial correlations that are obtained in the Wherry-Doolittle 
analysis, but rather the squares of the same semi-partial correlations involved 
in the square-root method of multiple correlation. 


Summary 


Detailed comparisons of three commonly used methods of multiple 
correlation were made in the present study, using data from an Air Force 
study of radiotelegraphy students for illustrative purposes. They were the 
Doolittle, Wherry-Doolittle, and Summerfield-Lubin methods of multiple 
correlation analysis. Wherry’s method and the Summerfield-Lubin analysis 
were shown to be equivalent, and their relationship to the Doolittle analysis 
was indicated. When used for selecting predictor variables, Wherry’s method 
and the Summerfield-Lubin method will select the same set of predictors in 
the same order, provided that the “test” for terminating the selection pro- 
cedure in both analyses is used as a decision tool rather than as an exact test 
of significance; selection by the Doolittle method, as used in the present study, 
should result in the same set of predictors though the order of selection may 
differ slightly from the order in the other two methods. 

The Summerfield-Lubin method and Wherry’s method are least squares 
approaches to the multiple correlation problem as indicated by their relation- 
ship to the Doolittle method. As Guttman [14] states, algebraic arguments 
originating from the same premise must necessarily lead to the same con- 
clusion. Other least squares methods can be shown also to be equivalent to 
the methods in the present study; the Cowles-Crout method [2], for instance, 
is merely a square-root analysis with the interim calculations being recorded. 
Horst’s work [16, 17] also, though he deals with multiple criteria, essentially 
uses the theory of the square-root method of multiple correlation. The 
different methods represent mathematically different approaches for measuring 
the estimate vector, which is a component of the criterion’s vector lying in 
the predictor-vector space. If the estimate vector is viewed as collinear with 
a unit-length vector, which can be written as a linear function of the predictor 

















HARRY E. ANDERSON, JR. AND BENJAMIN FRUCHTER 75 


variables, then the multiple correlation problem reduces to the zero-order 
correlational circumstance. This is why Kendall’ ({18], pp. 68-69) notes that 
multiple correlation is most generally a multivariably univariate condition. 

The square-root method of multiple correlation, because of its ease and 
compactness, and, as Summerfield and Lubin state, because of the clear 
interpretation of the coefficients, is recommended as the best least squares 
method for developing a given multiple correlational relationship. Either the 
Summerfield-Lubin use of the square-root method or Wherry’s method will 
suffice for the selection of predictors, though Wherry’s method, since it 
obviates the extraction of a square root seems slightly easier computationally. 
The square-root method is useful also as a step in the computation of inverse 
matrices, and thus facilitates the computations for many other types of 
multiple correlational analyses. 


REFERENCES 


{1] Anderson, H. and Fruchter, B. Procedural studies: the computation of an inverse 
matrix. Res. Guide No. 1. Austin: Psychometric Lab., Dept. Educ. Psychol., Univ. 
Texas, 1957. (Dittoed) 

[2] Cowles, J. T. A labor-saving method of computing multiple correlation coefficients, 
regression weights, and standard errors of regression weights. Lackland Air Force Base: 
Air Training Command, Psychological Research and Examining Unit, 1948. (Mimeo.) 

[3] Doolittle, M. H. Method employed in the solution of normal equations and the adjust- 
ment of a triangulation. Paper No. 3 in Adjustment of the primary triangulation between 
Kent Island and Atlantic base lines. Rep. of the Superintendent, Coast and Geodetic 
Survey, 1878. Pp. 115-120. 

[4] DuBois, P. H. Multivariate correlational analysis. New York: Harper, 1957. 

{5] Durand, D. A note on matrix inversion by the square root method. J. Amer. statist. 
Ass., 1956, 51, 288-292. 

[6] Dwyer, P. 8S. The Doolittle technique. Ann. math. Statist., 1941, 12, 449-458. 

(7] Dwyer, P. S. The square root method and its use in correlation and regression. J. Amer. 
statist. Ass., 1945, 40, 493-503. 

(8] Dwyer, P. S. Pearsonian correlation coefficients associated with least squares theory. 
Ann. math. Statist., 1949, 20, 404-417. 

[9] Fleishman, E. A., Roberts, M. M., and Friedman, M. P. A factor analysis of aptitude 
and proficiency measures in radiotelegraphy. J. appl. Psychol., 1958, 42, 129-135. 

{10] Fruchter, B. Note on the computation of the inverse of a triangular matrix. Psycho- 
metrika, 1949, 14, 89-93. 

{11] Fruchter, B. Introduction to factor analysis. New York: Van Nostrand, 1954. 

[12] Guilford, J. P. (Ed.) Printed classification tests. AAF Aviation Psychol. Prog. Res. 
Rep. No. 5, 1945. 

[13] Guilford, J. P. Fundamental statistics in psychology and education. New York: McGraw- 
Hill, 1956. 

[14] Guttman, L. Comment on Tryon’s formulation of the communality problem. Res. Rep. 
No. 17, Contract AF 41(657)-76, Univ. Calif., Berkeley, 1957. 

[15] Horst, P. A short method for solving for a coefficient of multiple correlation. Ann. 
math, Statist., 1932, 3, 40-45. 

{16] Horst, P. A technique for the development of a differential prediction battery. Psychol, 
Monogr., 1954, 68, No. 9 (Whole No. 380). 











76 PSYCHOMETRIKA 


[17] Horst, P. A technique for the development of a multiple absolute prediction battery. 
Psychol. Monogr., 1955, 69, No. 5 (Whole No. 390). 

[18] Kendall, M. G. A course in multivariate analysis. New York: Hafner, 1957. 

{19] Lubin, A. and Summerfield, A. A square root method of selecting a minimum set of 
variables in multiple regression: II. A worked example. Psychometrika, 1951, 16, 


425-437. 
[20] Roff, M. Some properties of the communality in multiple factor theory. Psychometrika, 


1936, 1, 1-6. 

[21] Stead, W. H. and Shartle, C. L. Occupational counseling techniques. New York: Amer- 
ican Book Co., 1940. 

[22] Summerfield, A. and Lubin, A. A square root method of selecting a minimum set of 

_ variables in multiple regression: I. The method. Psychometrika, 1951, 16, 271-284. 

[23] Thorndike, R. L. Personnel selection: test and measurement techniques. New York: 
Wiley, 1949. 

[24] Walker, H. M. and Lev, J. Statistical inference. New York: Holt, 1953. 

[25] Wherry, R. J. A new formula for predicting the shrinkage of the coefficient of multiple 
correlation. Ann. math. Statist., 1931, 2, 440-451. 


Manuscript received 11/14/58 
Revised manuscript received 7/11/59 

















PSYCHOMETRIKA—VOL, 25, NO. 1 
MARCH, 1960 


TWO-ALTERNATIVE LEARNING SITUATIONS 
WITH PARTIAL REINFORCEMENT* 


Mary I. Hanantat 


UNIVERSITY OF CALIFORNIA, BERKELEY{ 


The comparative effects of reward and nonreward on learning are con- 
sidered in connection with a two-alternative learning situation. Conditions 
are more general in nature than those discussed in an earlier article. In the 
statistical model proposed, the question of whether reward and nonreward are 
equivalent in their effects on learning reduces to testing a composite hypoth- 
esis on @ multivariate probability distribution. An asymptotic test of this 
hypothesis is described, and its use is illustrated with data from psychological 
experiments, 


1. The Problem 


The type of experiment to which the discussion in [3] refers is that 
in which a subject, faced on every trial with a choice between two responses, 
A and B, receives a reward every time he makes response A, and receives no 
reward when he makes response B. In such an experiment, the subject is 
said to receive “continuous reinforcement.” Experiments indicate that 
after a number of trials he tends to make the reinforced response with greater 
frequency. 

The learning process has also been extensively studied in cases where 
the subject’s responses are only partially reinforced. Here, his A responses are 
rewarded on some trials only and his B responses on some other trials, while 
responses on the remaining trials are left unrewarded. The schedule of rewards 
depends upon some law set up by the experimenter before the trials start. 
In an experiment of this nature, a subject is not always expected to learn 
one response to the complete exclusion of the other; he usually arrives at a 
greater tendency of making one response rather than the other, the strength 
of that tendency depending upon his experience as the experiment progresses. 

Tih.e model in this paper considers a subject’s experience only in terms 
of reinforcement of choice. Thus it ignores other aspects of the individual 

*This paper was prepared with the — support of the Office of Naval Research, 
and may be reproduced in whole or in part for any purpose of the United States government. 

tI wish to express my gratitude to Professors J. Neyman and E. L. Scott of the 
University of California, Department of Statistics, for their constant assistance and 
enco' ment throughout the research that led to this aper and during its preparation, 
and to Professor F. W. Irwin of the University of fenaaptvanis Department of | Psychology 


for his many helpful suggestions and comments. 
tPresently at the American University of Beirut, Beirut, Lebanon, 


77 














78 PSYCHOMETRIKA 


subject’s experience that may influence his tendency towards making one or 
the other of the alternative responses. General factors influencing all subjects 
simultaneously are, however, accounted for by the model. 

To take a specific example, consider an experiment of m trials, where 
each trial consists of two consecutive steps. 

(i) The subject makes a choice between two mutually exclusive re- 
sponses, A or B, as a prediction of the choice to be made by the experimenter. 

(ii) Independently of the subject and of choices made on previous 
trials, the experimenter makes choice A with probability z, a fixed number, 
and choice B with probability 1 — z. This law is not revealed to the subject. 

The outcome of any trial on such an experiment is therefore one of four 
possible outcomes. 


Outcome Choice made Choice made by 
by subject experimenter 
(1) A A 
(2) A B 
(3) B A 
(4) B B 


The subject is rewarded when his prediction of the experimenter’s 
response is correct; that is, when he predicts B and the experimenter chooses 
B, or when he predicts A and the experimenter chooses A. Outcomes (1) 
and (4) are therefore cases where reward occurs, while outcomes (2) and (3) 
are cases of nonreward. Notice that the subject’s choice of response A is 
rewarded on some trials only; in other words, his learning of that response is 
only partially reinforced. 

In the case of continuous reinforcement, subjects tend to make the 
reinforced response with greater and greater frequency as the number of 
trials increases. Analogously for experiments of the kind described above 
where reinforcement is partial, subjects tend to make response A with a 
frequency that approaches the probability w of reinforcement and response 
B with a frequency that approaches 1 — z. Examples of such experiments 
appear in the literature [5]. 

The question concerning the effectiveness of reward and nonreward 
in the case of continuous reinforcement may be asked here. For instance, 
it will be implied by the model that the occurrence of either outcome (1) or 
outcome (3) on a trial will increase the subject’s probability of making 
response A on the text trial. Assuming that both outcomes contribute towards 
learning response A, is outcome (1) which involves reward more effective 
than outcome (8) which does not? 

The next section will give in detail a statistical model for this type of 
experiment; the model is related to the basic Bush-Mosteller learning model 
{1] but slightly more general in nature. 




















MARY I. HANANIA 79 


2. The Model 


Although the learning situation now under consideration differs in 
some ways from the simple experiment of [3] the design of the experiment 
as well as the type of question under study are essentially the same. Here 
again, a subject is observed over a sequence of trials, his behavior at the 
beginning of each trial and the outcome of his action at the end are noted. 
Then one looks for the effects of that outcome on his behavior in the next 
trial. 

Analogous to the notation adopted in [3], denote by p;; the conditional 
probability that the subject 7 makes response A on trial 7, given his previous 
experience. Since one of the two responses A or B must occur, g;; = 1 — pj; 
is the probability that he makes response B on trial 7. Whatever response 
the subject makes, there are two possible outcomes to his choice: he is either 
rewarded or not rewarded. Denote the outcome of trial 7 for subject 7 by 
the value y;; of the random variable Y;; , which is unity if he is rewarded 
and zero if he is not rewarded. Thus, corresponding to the sequence of trials 


1,2, --: , 9, ++ , there are the following sequences for each subject. 

(i) A sequence of probabilities of making response A; this is pj: , Diz , 
--+ , ps; , °++ for the subject 7. 

(ii) A sequence of outcomes in terms of reward or nonreward; for the 
subject 7, this is yi. , Yio, °°*° 5» Yes oe 


In the search for a general relation between p,;; and p;,;.; , with which 
to describe the change that occurs in the probability of response for the 
subject z from one trial to the next, the following points should be considered. 

(i) It is usually implied in any learning situation (and was also assumed 
in [3]) that the probability p,;,;,, in general depends on both p,; and the 
outcome y;; of trial j. 

(ii) In this particular learning situation, the dependence should be such 
that, as the number of trials increases, the probability p,;; approaches z, the 
probability of reinforcement of response A. This tendency has been observed 
in a large and varied number of learning experiments of this type ({1], ch. 13). 

(iii) The expression for p;,;,, should involve some parameter, say @, 
which can be interpreted as a measure of the presence or lack of difference 
in the effects of reward and nonreward. 

Thus, one seeks an expression in which 6 < | plays the role of bringing 
P:.i+1 Closer to x if the outcome of trial j is reward (y;; = 1) than if it is no 
reward (y;; = 0). 

With these considerations in mind, assume 


Pita - FF 0*'ws4i(pii — =), 


where @ is a constant such that 0 < @ < 1, and w;,, is a constant which 
depends on the combination of factors (other than reward or nonreward) 











80 PSYCHOMETRIKA 


that bring the probability of response A closer to x at the termination of the 
trial j. 
This iterative relation may be used to derive a general expression for 
pi; by writing 
Pig oT = 6° *wo(pir aaa 7); 
6”"*ws(pi2 — 7) 


=< 6°***** ww ows(Dir = ™); 


Pa = 


Dis —-Tr= 98 *¥89* 98 weer =e ™), 
etc.; by iteration 
ame 
Di — a = 0 W. +++ W(piu — 7). 


Let us assume that p;, is the same for all 7, in other words that all sub- 
jects have the same initial probability p, at the start of the first trial. An 
equivalent assumption was made in [3]. 

To consolidate the constants p, — 7, W2, Ws, °** ,W;, °** into a more 
convenient notation, substitute 


d, for p, — 7, 
d, for w.(p, — =), 
d, for w,;w(p, — 7), 


d; for w; +++ w(p, — x). 
With this notation, the general expression for the probability p,; becomes 
Lv & 
Pi —~ 3 = 0 d;, 
or, as a function of x, 6, d; , and the outcome of previous trials, 
Fou 
(1) Diy = FT + 6 d; ’ 


where 0 < z < 1, and d; is a constant which may be positive or negative 
depending on the value of x. Since p,; is a probability, 


i=! 
vik 


O<rt+o d; 1, 




























MARY I. HANANIA 81 


while 0 < 6 < 1. This means that both inequalities 
oss 2 va 
—xr<60° d; and 6° dj<1l-7- 


must hold while 0 < @ < 1, or that — r << d; < 1 — rforeachj = 1, --- ,m. 
Actually, a slightly stronger restriction must be imposed on the d; , namely 


—x<d;<1l1-—a7 foreach j=1,-::,m. 


This will be necessary for obtaining the results of section 3.2. 

Equation (1) gives the probability that the subject z chooses A on trial 
j, given that in the j — 1 previous trials he made exactly >-i=} y, rewarded 
(A or B) choices. 

Setting @ = 1 in (1) gives p;; = * + d; , which is equal to the probability 
that subject 7 choose A on trial j, given that none of his previous choices was 
rewarded. Thus, 6 = 1 implies that reward and nonreward (in the sense 
adopted in section 1) have equivalent effects on the subject’s future prob- 
ability of making choice A, or of his learning of response A. 

On the other hand, writing (1) as 

i= 
vik 


Pi - T= . d; ’ 


and remembering that learning takes place as p,; — 2, consider the impli- 
cation of 6 < 1 on the value of the | p,; — x |. If @ < 1, then | pi; — wr | 
decreases as > j=! y,, increases. But >. {=} y,, is the total number of previously 
rewarded responses made by the subject i before trial j and j=? y, = 0 
implies that none of the subject’s previous responses was rewarded. There- 
fore, if @ < 1, reward has a stronger effect on making | p;; — = | small than 
has nonreward. 
A similar analysis of the implication of 6 < 1 on 


Qs = 1 — Dey 
= P [subject 7 choses B on trial j | outcomes of all previous trials] 


would lead to the conclusion that reward has a stronger effect on making 
| 1 — « — q;; | small than has nonreward. 
Notice that when + = 1, (1) reduces to 


d; with —1<d, <0; 


therefore, 


> 


qi =—-0 d;, 





82 PSYCHOMETRIKA 


which is equation (2) of [3], with }-i=! x4, replaced by ) iz! ys and wu; re- 
placed by — d; . In fact, when x = 1, the whole experiment reduces to the 
situation for which the model of [3] was constructed. 
For the distribution of the random variables Y;; (¢ = 1, --- ,r;j = 1, 
- , m) under a set Q of admissible hypotheses, assume the following. 
(i) On the first trial all subjects make choice A with the same prob- 
ability p, , so that 


Pa = Pi =aw+d, forall «7 =1,---,r. 


(ii) The subjects act and learn independently of each other, so that 
Y,;; are independent for different 7; 


zs 
Vik 


(iii) Di =rt+e d;, 


where0 < @<1,—2<d; <1-—-7,andj =1,-:-,m. 

Assumptions equivalent to (i) and (ii) were made on the model in [3]. 
Assumption (ili) is a restatement of equation (1), which has already been 
discussed. 


3. Testing the Hypothesis 6 = 1 
3.1 Distribution of the Random Variables 


Considerations with regard to the properties desirable in this test are 
the same as those discussed in connection with the test of the hypothesis 
6 = 1 in [3]. Both tests are similar in form and possess the same optimum 
asymptotic properties. This is not surprising since, owing to the close corre- 
spondence between the two models, the joint probability distributions of 
the random variables Y;, (¢ = 1, --+ ,r;7 = 1, «++ , m), both under the set 
of admissible hypotheses and under the hypothesis tested, are directly 
analogous to the respective probability distributions of the X;; (¢ = 1, --- , 7; 
j = 1, --- , m) as given in equations (4) and (5) of [3]. 

The probability distribution of the Y;; (¢ = 1, --- ,r;j = 1, --- , m) 
under the set 2 of admissible hypotheses can be deduced from (1) as follows: 


PIYs; = 1] yay -e+ Yes 5 DI 
= pix + (1 — pi)(1 — 2) 
m(2r — 1) +1 —29 + 07" d(2r — 1) 
1 — 2n(1 — x) — 07" d(1 — 2m), 


where 7';,;-, denotes the sum )viz! yy, . 





MARY I. HANANIA 


for 2zx(1 — 7), 
a; for d;(1 = 2r). 


Then 0 < A < .5, whilea; = Owhenz = .5,and — x <a;/(l1—2r) << 1l—- 
otherwise. Then the probability of reward for the subject 7 on the trial 7 
can be written more briefly as 


PIV; =1 [yay -e* Yes 5 OQ) =1—A— O7'"'G, . 
It follows that 
PLY; =O] yay °° y Ysi-1 3 QD] = A+ 07a, , 
and therefore, 
PUY ss = yes | Yar 00+» Year 3 Q] 
(l— A — Oa) + OT, 


This is a conditional distribution, from which the joint distribution of Y;; 
for all j = 1, --- , mand then for all 7 = 1, --- , r can be deduced, giving 


P[Y;; =y;, all ¢ and j| Q] 


(2) Il Il (1 oe ao grrieg Pt + ogy: 


t=1 j=1 
The hypothesis being tested is again the composite hypothesis 
H:6=1 


against the set of alternatives 0 < @ < 1, witha, , --- , a, as the unknown 
parameters. Setting 6 = 1 in (2), 


P{Y:; = yi, all ¢ and j| 4H] 
(3) 


twa 


r 
ey i vii 
° 


m L vii 
re ITa -a-a,)* (A + a;) 


j=1 
3.2 An Asymptotically Locally Best Similar Test of H 


In testing the composite hypothesis H at level of significance a, one 
uses the procedure outlined in [3] and looks first for a size a test that is both 
similar and uniformly most powerful against all alternatives in ©. Since 
neither such a test nor one that is locally most powerful exists ((2], sections 
9.1 and 9.2), one adopts the alternative approach of constructing a test that 
is asymptotically similar, namely a test defined by a sequence of critical 
functions y,(y) such that, as r — ©, the expected value of ¥,(y) under the 
hypothesis tested, H, tends.to the preassigned level a, independently of the 
specified parameters. 








84 PSYCHOMETRIKA 


A method for constructing such a test was described in [3]; this method 
is an extension to the case of several nuisance parameters of a method first 
given by Neyman [4]. In applying the extended method to the present 
problem, one follows the construction outlined in ((3], pp. 60-63) starting 
with a set of r independently distributed functions of the observations, 
f,, °°: , ¢,, and following through the same series of transformations that 
lead, in this case, to a sequence of critical functions ¥,(y). Owing to the basic 
analogy between the probability distribution of the random variables 
Y,; (¢@ = 1,-+-,7; j = 1, «++ , m) on the one hand and that of the random 
variables X;; (¢ = 1, --: , 7; j = 1, --+ , m) of [3] on the other, and to the 
nature of the hypothesis tested in either case, both the construction and the 
resulting properties of the asymptotic test are identical. 

In this case, the set of r functions {; are defined as 





= a, 
(4) i= p> ( nk ane a;)(X + a;) [ysi ae (1 ~ a a,))T ;,;-1 

fori = 1, --- , r, where = 2x (1 — x) and 7,,,-, = Diz} ya . This form 
of the function is obtained from the probability distribution (2) by consider- 
ations analogous to those that led to equations (6) and (7) of [3] and indeed 
reduces to the form of equation (7) of [3] in the case = 1. For each 7, the 
function ¢; depends on A and the unknown parameters a, , --: , @, , a8 well 
as the observation y; = (yi: , --* , Yim) on the random vector Y; = (Y;,,,°°-, 
Y,,). From (3) it is clear that, under the hypothesis H, the r vectors 
Y, , --- , Y, are independently and identically distributed with the prob- 


ability distribution 


(5) PLY; =y:|\,4,°°: ,an;H) = []Q—vA-a)"QA+ a) 
i=1 
fort = 1, ---,7. 
Following the notation used in [3], now let 
te) 
Gi = 3a, PS FIYs ee | A, a » °° 50m ; 1] 


for j = 1, --- , m; also let Z; and ¢, be the standardized values respectively 
of ¢; and g;, and R; = E (Z,,). Then, for each i = 1, --- ,r 


Z;- dX Rie; 


(6) {Vij AG 8 a) = — |; 
i — >R 
i=l 


f-em these r functions 


(7) A a ee a 


T i#1 














MARY I. HANANIA 85 


This function, which corresponds to Y, of [3], depends upon the unknown 
parameters a; , -** , @, , Whose maximum likelihood estimates a¥ , --+ , a%, 
computed from the probability distribution (3), are 

(8) at = ee 


for j = 1, +++ , m, where e; = >o701 Yi; - 
Replacing the unknown parameters a, , --- , @, in (6) by their estimates 
at , +++ , a® , and simplifying the resulting expression, 


r -1 
2 5 EN te, | > va ~~ < 1 =v a § doe 
(9) wt ‘oi i=] j=2 i T rl “ee a 7 = i k=l 3 
r' ee i 
" > e(r — @) > alr — a9) 
Like the function (12) in [3], the function W* can be shown to have a prob- 
ability distribution which, under the hypothesis tested, approaches the 
normal with zero mean and unit variance as r — ~, 

Since the function W* is not defined when either e; = 0 or e; = r for 
some j, replace the vanishing quantity (e; or r — e;) by some function of r, 
say 1/r, which approaches zero as r — o. Clearly, this modification in the 
value of W* does not affect the asymptotic behavior of the function. The 
asymptotic test is now defined as follows. Compute W* from the data, 
choose a level of significance, and reject H with probability y,(y), where 


1 
t 
t 
5 
’ 
t 
3 
Y 


Ss 








v-(y) = 1 whenever W*>k,, 
v-(y) = 0 whenever W* <k,, 
0 < ¥.(y) <1 whenever W* =k,, 
where k, is defined by 





he 
“"2dt=1—a. 
Ti [ie 


This test has the same properties as the test 7’, of [3]; it is a locally best test 
of H against alternatives with 6 < 1, the parameters a, , --- , a, being 
unspecified and asymptotically similar, of size a. 





3.3 Application of the Asymptotic Test 


Given a set of rm observations y;; (¢ = 1, --- ,r; 7 = 1, ++, m), wherer 
is the number of subjects in an experiment of m trials, with 


-{' if the subject 7 is rewarded on trial j 
Yss 0 if the subject 7 is not rewarded on trial j. 

















86 PSYCHOMETRIKA 


For testing the hypothesis H: @ = 1 (reward and nonreward are equally 
effective) against the alternatives that 6 < 1 (reward is more effective) at 
the level of significance a, apply the following rule. 

(i) From the data, form the sums 


? dias Do vis 


t=1 


for 7 = 1, --- , m; e; is then the total number of subjects rewarded on the 
trial 7. Let 


e} =e; whenever e; ¥0 or r 
= 1/r whenever e; = 0 


= (r —1)/r_ whenever e; =r. 


(ii) From the experiment, determine \ = 2x (1 — 7) and 1 — X. 
(iii) Compute 


r m 1-—r) -2e imi 
M= yo De 


i=1 j=2 ex(r — é;) 


m ys inl 
uk om \) i> ef, 








> eee ef Par 
1afrdi—»-eéP S 
ie so 7 , ee 
or = a Bat Zar — a 
and combine these three quantities to obtain 
- M-C' 
(10) W = oy: 


(iv) From a table of the normal distribution function find the upper a 
point, k, , defined by 


ka 
(11) : [ ec’? dt=1-a. 


4/24 -@ 
Then reject. H with probability ¥(y), where 


vy=1 ff W>k., 
(12) vy =0 if W<k., 
Os vy <1 if W=Kk,, 


the probability ¥(y) in the latter case to be chosen so as to make the expected 
value of ¥(y) exactly equal to a if H is true. 

















MARY I. HANANIA 87 


3.4 Illustration 

For illustration, data from an experiment on prediction with partial 
reinforcement run by Irwin and Mayfield are used*. The subjects, 20 women 
undergraduates at the University of Pennsylvania, were asked at the start 
of each trial to predict whether or not a light would go on. Then the light 
went on with a probability = .5, and therefore \ = .5. The correct pre- 
dictions are shown as entries of unity in Table 1, with entries of zero for 
incorrect predictions. 


TABLE 1 


Data of Irwin and Mayfield from a “Humphreys-type” Experiment with r=.5.¥ 














Trials (j) 
Subjects (i) 

1 SoS SB Se Re le aS Ae 3S 6 TE! 1S. PP a Sk ae 
1 RR Beak Se es 1 1 1 1 Be Ae Poe i Od 1 Ces l 
2 . ee 1 1 coe ae oe oe 1 .) Sep to Cee tee 
3 0 1 1 Oo" Oo a i) 1 0 l 1 l 1 0 60 l 0 l o 2.9 1 
4 Oo. o1 1 0 1 oo O48 1 1 ae 1 1 0 l 1 1 1 i 0 1 
5 o 0.00. 60-8 1 1 1 1 ie 1 l 1 1 1 O°. O" Os @ 
6 0 1 oO 0.0 1 1 Oy ieee 0 1 1 1 ee eee l Oro8 0 
mu oe 4 l 1 O'.0 3 0 60 1 l 1 0 1 1 i) 1 l 1 0 l 1 
8 Fo Cesk Fh ee, ae BS SOS Se eee ae eer oe 
9 es a | D. Bee Oe 3 yO ok ees, Fae 1 Bs i Ord 1 1 1 l 
10 0 1 i) 1 1 0 1 1 Oo 60 1 1 0 1 0 1 1 1 0 1 1 1 
ll 0. @ «6.3 1 Ae, TN ek i, Bee 1 0 1 1 1 1 1 1 1 0 1 
12 O-.2S Oo ae ee ee SO ari a Rr Bi Oe ae a 
13 1 1. O..1 Ok B20: KI a a ee 1 1 ok aoe 
14 1 o,°9 1 Oo 1 o 0.6 1 0.6 8. 8&3 1 1 0 1 l o..0 
15 a 0 1 1 Oo 1 1 1 0 1 1 1 1 0 1 0 1 1 0 1 1 
16 | 1 | Vag, Ee 1. 8 Ox 1 1 1 Pst Rae: 3 1 1 i ® 
17 0 1 Oo 1 1 0 oO 1 1 1 0 Oo 1 0. O-< 6 1 0 1 l 1 0 
18 o~ & 0 0 O 0 1 0 60 1 se 1 oO." 0 1 1 l 1 0 l 
19 2. ee ee ee ae Se “i... © , Sag ) Same “Sa beim | | ea a 
20 See Ke ee eo eee ee ee 1 1 ee ee Ay 
Total (e;) 3. 92 6 3329 6 O82 Fh ER 22 1S 9, BS SS Se 





« An entry of 1 indicates yijr) (reward, or correct prediction), a zero entry indicates y4j°9- 


In this experiment there are 20 subjects (r = 20) and 22 trials (m = 22). 
For each j (j = 1, --- , m) first compute 


20 
ey = > Yii » 
which yields 
e~, 20-e, 10-¢e4, 


10 — e} 10 — e (19: ef)". 
20 —e;’ e(20—e;)’ e,(20 —e;) 





*Personal communication from F. W. Irwin. 








88 PSYCHOMETRIKA 


Using these values, find 


i-1 i-1 
Dek, Do e(20 — ef) 
k=1 k=1 
for each j = 2, --- , 22 and then compute 
i-1l 
u~ Bhiao-o yas De Yee» 


1 eee 
C= 0 CE ee i Set, 


iia + G0 — e)" Gas e;)* , en 
id = ies Fe 2X a0 = €) ¢ > e(20 ei), 


as shown in Table 2. Then, by (10), 


M —C’ 
W= ai = 3.89. 





TABLE 2 


Computation of W from Data in Table ag 








2 ‘ . 
j e 20-e 10-e bent a sone a ss ee (20-e,) 3 : 
j adie | j r : - <9, Ze > e¢(20-e zy z= 

j j Jj O-e j ej O-e 5) ej ej kal k ed k k 7a a Vik 
1 3 17 7 0.4118 0.1373 0.9608 
2 12 8 -2 -0. 2500 -0.0208 0.0416 3 24 1 
3 5 15 5 0.3330 0.0667 0.3333 15 75 4 
4 13 7 -3 -0.4286 -0,0330 0.0989 20 140 13 
5 9 11 1 0.9091 0.0101 0.0101 33 363 14 
6 6 14 4 0.2857 0.0476 0.1905 42 588 14 
7 1} 9 -1 -O.1111 -0.0101 0.0101 48 432 22 

12 8 -2 ~0. 2500 -0.0208 0.0417 59 472 27 
9 7 13 3 0. 2308 0.0330 0.0989 7 923 27 
10 12 8 -2 -0.2500 -0.0208 0.0417 78 624 47 
ll 12 8 -2 -0, 2500 -0,0208 0.0417 90 720 55 
12 1l 9 -l -O.1111 -0.0101 0.0101 102 918 59 
13 10 10 0 0.0000 0.0000 0.0000 113 1130 59 
14 15 5 -5 -1.0000 -0.0667 0.3337 123 615 95 
15 9 1 1 0.9091 0.0101 0.0101 138 1516 54 
16 8 12 2 0.1667 0.0208 0.0417 147 1769 59 
17 12 8 -2 -0. 2500 -0.0208 0.0417 155 1240 88 
18 14 6 -4 -0.6667 -0.0476 0.1905 167 1002 116 
19 15 5 -5 -1.0000 0.0667 0.3333 161 905 139 
20 13 7 3 -0. 4266 0.0330 0.0989 196 1372 127 
21 9 ll 1 0.9091 0.0101 0.0101 209 2299 98 
22 12 8 -2 -0. 2500 -0.0208 0.0417 218 1744 140 





a“ dh =1(l-.5) =.5 l-r =.5 


Mo = 20,9139 
+ 20.0389. 15 Mec! | 0.4189 + 1.2119 
site: 20 a ghar - 6.4189 
©” = 0.4189 = 3.8931 


To test the hypothesis that 6 = 1 (agreement and contradiction are 
equally effective) at the level of significance .05, the rule given by (12) is to 
reject the hypothesis tested if W > k.os = 1.645. The computations above 
would therefore lead us to reject the hypothesis that @ = 1, or accept the 














MARY I. HANANIA 89 


alternatives that reward (making a correct prediction) was more effective in 
this experiment. 

A difficulty that arises in applying the asymptotic test to experimental 
data in problems of partial reinforcement concerns the assumption about 
the probability x. The assumption is that the experimenter makes his choice 
of one response on each trial with probability x independently of choices 
made on previous trials. Thus, the experimenter is assumed to build up his 
schedule of choices by some completely random process. He may, for instance, 
use the first m entries on a page of random numbers to decide his choices on 
the m trials of the experiment. However, the common procedure among 
experimenters is to randomize the choices within blocks of trials, keeping 
the relative frequency of choice A in each block exactly equal to z and the 
relative frequency of choice B exactly 1 — z. Thus there is a dependence in 
this practice among the choices made by the experimenter within each 
block; this dependence violates one of the assumptions in the model. How- 
ever, this dependence becomes negligible as the number of trials per block is 
increased. 

Another common procedure among experimenters is to set up only one 
schedule of choices and use it on all r subjects simultaneously. This is an 
excellent time-saving device, and since a subject’s learning depends on his 
responses as well as on the experimenter’s choices, the same schedule need 
not produce identical effects on different subjects. However, it introduces an 
element of dependence among their experiences, which is not in perfect 
agreement with the model. 


3.5 Testing Against Other Hypotheses 


As in [3], the result of the previous section may be extended by allowing 
for the possibility of 6 > 1, or for nonreward to have a greater effect than 
reward. For this purpose, consider the distribution of the random variables 
Y,;;(@=1,-++,7; 7 =1, +--+ , m) under a larger set of admissible hypotheses 
Q, , in which the only addition to Q will be a provision that @ lies in a larger 
interval than 0 < @ < 1, extending beyond unity to the right. 

Under the set 2, , let 6 be in the interval 0 < 6 < 6, , where 0 > 1. 
Since the probability 

i=. 


vik 


1 . A = as a; 
must lie between zero and unity, 


(i) when max @; is positive, 





0<1-—-A-— 6a;, a< 





PSYCHOMETRIKA 
(ii) when min aq; is negative, 
1-A-— 6a; < A 


therefore 6,, is the smaller of these two fractions. 
In this new set of admissible hypotheses, the asymptotic test described by 
(10) to (12) can be modified for testing the hypothesis 


H.:8 =2 


against the alternatives 1 < 6 < 6, . This situation is the mirror image of 
the original problem for which the asymptotic test was obtained. In fact, 
it can be easily seen that the asymptotically locally best similar test of H, 
in Q, is given by 


¥(y) = 1 whenever W <k,, 
(13) v(y) = 0 whenever W>k,, 
0 < ¥(y) <1 whenever W=k,, 


where W and k, are defined exactly as in (10) and (11). Notice that (12) 
and (13) differ only in the direction of the inequality signs between W and k, . 

Of the two hypotheses, H and H, , the decision as to which to test in a 
particular case actually depends on the problem with which the experimenter 


is concerned. If he is largely concerned with the question of whether or not 
reward is more effective, then the hypothesis tested is H: 6 = 1 against 
0 < 6 < 1. If, on the other hand, he suspects that nonreward (he may wish 
to describe it as punishment) is the more effective of the two, then the 
appropriate hypothesis to test is H, : @ = 1 against @ > 1 in the set Q, of 
admissible hypotheses. 


REFERENCES 


{1] Bush, R. R. and Mosteller, F. Stochastic models for learning. New York: Wiley, 1955. 

(2] Hanania, M. Some statistical tests of hypotheses on the effect of reward in learning. 
Unpublished doctoral dissertation, Univ. California, Berkeley, 1957. 

[3] Hanania, M. I. A generalization of the Bush-Mosteller model with some significance 
tests. Psychometrika, 1959, 24, 53-68. 

[4] Neyman, J. Sur une famille de tests asymptotiques des hypothéses statistiques com- 
posées. T’rabajos de Estadistica, 1954, 5, 161-168. 

[5] Goodnow, J. J. and Postman, L. Probability learning in a problem solving situation. 
J. exp. Psychol., 1955, 49, 16-22. 


Manuscript received 9/9/58 
Revised manuscript received 7/11/59 





PSYCHOMETRIKA—VOL, 25, No. 1 
MARCH, 1960 


AN EMPIRICAL STUDY OF THE NORMALITY AND 
INDEPENDENCE OF ERRORS OF MEASUREMENT 
IN TEST SCORES* 


Freperic M. Lorp 


EDUCATIONAL TESTING SERVICE 


An empirical study of test scores shows the variance of the errors of 
measurement to be significantly associated with true score in each of four 
groups studied; it also shows the distribution of the errors of measurement 
to be significantly skewed in three of these four groups. The mathematical 
rationale underlying the statistical treatment is presented. Standard error 
formulas are given for making the necessary significance tests. 


It is usually convenient to think of errors of measurement as being 
distributed normally, independently of each other and of the true value 
that is being measured. In most mental testing situations, however, there 
are upper and lower limits to the range of scores that may be assigned. It 
follows from this that the frequency distribution of the errors of measure- 
ment cannot be the same when the true score is close to zero, for example, 
as it is when the true score has some less extreme value. 

The normality and independence assumptions, if valid, would provide 
a very simple and convenient model for test theory. The question arises: 
to what extent do deviations from these assumptions occur in practice? 
Indeed, can deviations from these assumptions be empirically demonstrated 
at all? The present study is an attempt to answer these questions. After 
outlining the basic assumptions and the method to be used, later sections 
describe the data for the empirical study and present the results, showing 
that the normality and independence assumptions do not fit the data. 
Finally are presented the technical details on the determination of the 
estimators and of the standard errors required for the significance tests. 


Mathematical Formulation 


The question at hand relates to the hypothetical bivariate scatterplot 
between true scores and errors of measurement. Since this scatterplot cannot 
be constructed empirically, it will be necessary here to develop some theory 
for making the necessary inferences. 

It will be assumed that when examinee a takes test u, the error of 
measurement ¢,, is a chance variable with an expected (mean) value of 


*This research was in part carried out under Contracts Nonr-2214(00) and Nonr- 
2752(00). with ‘the Office of Naval Research, Department of the Navy. 


91 











92 PSYCHOMETRIKA 


zero. This expected value remains zero regardless of the size of the errors 
of measurement obtained by other examinees on the same test, or obtained 
on other tests by the same examinee. The frequency distribution of ¢,. 
is assumed to be the same for all examinees having the same true score. 

The true score of examinee a will be denoted by ,, . It will be defined 
simply as the difference between the observed score z,, on test u and the 
error of measurement: 


(1) 6. — Tua 52 Cua bad 


No other definition will be needed. ¢,, for a given examinee, is a fixed quantity; 
it becomes a random variable only when a process of sampling examinees 
is considered. 

When test wu is given to a population of examinees, there is for each 
examinee a single value of each of the three variables é , x, , and e, . The 
first moments » and second moments o” of these three variables for the 
population of examinees are known from conventional test theory to have 
the following relations (it can be seen that these may also be derived from 
the assumptions already made here): 


(2) Me = Mz ; 
(3) of = 0: — a. 


If, further, two tests u and v have the same true score so that ¢,, = &,, for 
all a, then 


(4) ot or Cruze , 


the term on the right being the covariance between x, and x, . Thus, from 
(3) and (4), 


5 See gee : a pee 
( ) Ge = Fx Truze 3 Oe, = Gz, Truzy * 


Note that tests u and v have not been assumed to be strictly parallel; thus 
the variance of the errors of measurement may be different for the two 
tests. Finally, by assumption, 


(6) Stew ~ Cte, wai Toues ” 0. 


Higher Order Moments for True Scores and Errors of Measurement 


Equations (2), (4), (5), and (6) express the first- and second-order 
moments of true scores and errors of measurement in terms of the observable 
moments of z, and z, . The purpose here is to obtain similar formulas for 
third-order moments. It will then be shown how these third-order moments 























FREDERIC M. LORD 93 


can be used to test the hypotheses of the normality and independence of 
the errors of measurement. Some new notation will be needed. It will be 
helpful to work with the deviation-scores Z.. = Tue — Mz ANd fue = bua — Me 
By (2) and (1), 


(7) Zee = Sun + Cue: 


Let E denote the operation of taking an expected value (average value 
for the entire population of examinees). Let u,,, = Hztz%zi denote a tri- 
variate central moment of the observed scores on tests u, v, and w. It is 
given that these tests have the same true score, ¢. Let M,,.,. = Efetejes 
denote a moment of the true score and errors of measurement on the three 
tests. 

In this notation equations (4), (5), and (6) become 


(4’) Mz, 000 = Hiio » 
(5’) Mo, 200 = 200 — F110 3 Mo,020 = Ho2o — Mio » 
(6’) M, 100 = M, 10 nad Mo,110 = 0. 


From (7), 22, = £2, + 3f%.@ue + 3¢e2, + e&, . Taking expected values 
gives the result u300 = Ms,000 + 3M1,200 + 3M2,100 + Mo.s00 . But Mo,100 = 
Eg ue = 0 because the expected value of e,, is zero for every value of ¢.. . 
Thus 


(8) H300 = Ms,,000 + 3M, 200 + Mo,s00 > 


Consider next the product 22,z,, = ¢ + 2fe, + fe, + Se. + 
2h Lubes + €? lve e Now, M3.100 = Mz,o10 = 0, as already noted, and My, 10 = 0 
and Mo,2.0 = 0 for similar reasons. Thus, 


(9) M210 = Ms3,000 + M, 200 . 
It is similarly found that 
(10) Aun = Ms,000 . 


Additional formulas may be obtained from (8) and (9) by permuting 
subscripts, leaving those referring to true score untouched. For example, 
from (9), 


(11) Ki20 = Ms,000 + M, 020 ’ Han = Ms,000 + M, 200 » ete. 


Equations (8) through (10) and the others implied by them express 
observed-score moments as sums of the moments of true scores and errors 
of measurement. The equations may be solved so that the true-score and 





94 PSYCHOMETRIKA 


error-of-measurement moments are uniquely expressed as sums of the 
observed-score moments. The resulting equations are readily found to be: 


(12) M3000 = fii 5 


M, .200 = Baio — Kirt = Heor — Mii » 
(13) M, 1020 = Mizo —~ Mint = Mo21 — Miia » 


M, 002 = sioe Mit = Boi2 — Hii ; 


Mo,200 eee 3M, 200 — Fii > 
(14) Mo.030 = ee 3M, 020 — Hii » 
Mo,003 —=Moos 3M, 002 — Mii - 


(Similar equations have been worked out for higher-order moments in [5], 
but these will not be needed for present purposes.) 

As already noted, all the third-order M’s not listed in (12), (13), or (14) 
are, by assumption, equal to zero. 

Equations (13) show that the assumptions made here about the errors 
of measurement impose certain restrictions on the observed-score moments; 
specifically, that 


Meio = Me2o1 
(15) Hi20 = Moai 


102 = Moi2 + 


It should be noted that up to this point parallelism of the actual tests has 
not been assumed; thus moments such as poio , Mi20 , ANA p02 Will not in 
general be equal to each other. 

The M’s in (13) and (14) are useful statistics for investigating the 
normality and independence hypotheses about the errors of measurement. 
In the first place, if the errors of measurement on test u (say) are distri- 
buted independently of true score, then Ef,e, = Et,Ee, = 0 because 
the expectation of the product of two independent variables is equal to 
the product of their expectations. Thus each M in (13) provides a test for 
what will be called the independence hypothesis. 

Secondly, each M in (14) is a third moment and thus measures the 
skewness of the frequency distribution of the corresponding error of measure- 
ment. If the distribution of the errors is a symmetrical one, both sides of 
(14) must vanish. Of course, the errors of measurement might be sym- 
metrically but nonnormally distributed; however, if they are found to be 





FREDERIC M. LORD 95 


unsymmetrically distributed, they are surely not normally distributed. Thus 
(14) provides a test (even though not an entirely efficient one) for the nor- 
mality hypothesis. (Note that this hypothesis refers to the distribution of 
the errors of measurement—the distributions of the true scores and of the 
observed scores may be of any shape at all, as far as the present development 
is concerned.) 

If a large random sample of examinees is drawn from the population 
of examinees, the sample values corresponding to the M’s in (13) and (14) 
will differ from the M’s because of sampling fluctuations. Significance tests 
can then be carried out to determine whether these sample values differ 
from zero more than could be accounted for under the normality and inde- 
pendence hypotheses. This is possible because (i) the formula for the large- 
sample standard error of each such sample value can be derived under 
the normality and independence hypotheses, (ii) each such sample value 
is a function of moments, and is thus known to be approximately normally 
distributed in large samples ([2], sec. 28.4). This is the procedure that was 
carried through in order to make significance tests on the data described 


below. 


Data 


A 150-item vocabulary (synonyms) test was administered to a nation- 
wide sample of about 13,000 college and university seniors. A very few 
examinees who did not reach item 144 were excluded from the study, as 
were items 145-150. Item difficulty and discrimination indices were com- 
puted from a random sample of 2,500 examinees. These were used to assign 
most of the 144 items to one of six nonoverlapping subtests: a “control 
test’”’ of 24 representative items and five 22-item tests selected with great 
care so as to be as parallel as possible. 

The control test was used only to select four groups of 1,000 examinees 
each: 


Group L—the examinees with the lowest observed scores (x = 0 to 8), 
Group M—examinees with middle scores (x = 15), 

Group H—those with highest scores (a = 22 to 24), 

Group O—a spaced sample of all remaining examinees. 


The answer sheets were scored for each of the five parallel 22-item tests. 

The means and the covariance matrix of the observed scores on the 
five tests were computed for Group O. Wilks’ procedure ((3], ch. 14) was 
applied to determine whether the five tests could really be considered parallel. 
The null hypothesis that all five tests have the same population means, 
the same variances, and the same covariances was accepted at the .05 level. 
This is a severe test, particularly since N = 1000. Since only four parallel 




















96 PSYCHOMETRIKA 
TABLE 1 
Frequency Distribution of Observed Scores, Together with the 
Estimated Distribution of True Scores, for Four Groups of Examinees 
Group O Group H Group M Group L 
Ob- Ob- Ob- Ob- 
served True served True served True served True 
Score score score score score score score score score 
22 4 $1 
eal y 150 , Ak a 
20 32 12 202 267 5 
19 52 38 202 290 16 1 
18 63 67 151 198 31 9 8 
17 83 92 109 105 54 35 2 
16 105 108 65 46 93 7 2 
a5 98 115 35 17 118 124 5 uf 
14 93 115 20 6 119 162 9 3 
13 95 107 9 e 130 AI7 16 6 
12 86 gk 5 118 =: 163 29 15 
11 7 78 2 114 126 48 31 
10 65 61 80 79 72 58 
a 47 45 56 37 92 96 
8 32 31 3h 10 114 139 
T 22 19 19 1 124 = «173 
6 17 11 q 130 179 
5 10 5 4 114 149 
4 6 2 1 109 94 
3 > 69 ko 
= 2 44 11 
a 17 1 
e) 5 
1000 1000 1000 ©1002 1000 ©1001 1000 998 





Reproduced, with permission, from Educ. psychol. Measmt., 1959, 19, 335. 


tests were desired, the seemingly least parallel of the five tests was excluded 
from further analysis. 

The nature of the four groups studied is best seen from Table 1, which 
presents the observed-score frequency distribution of each, averaged over 
the four parallel 22-item tests. Table 1 presents, only incidentally, an esti- 
mated true-score distribution for each group, obtained by fitting a Pearson 
Type-1 distribution to the first four true-score moments, which had been 
estimated by an obvious extension [5] of the method described in the present 


paper. 


Empirical Results 


Unbiased sample estimates (f’s) were obtained (see next section) for 
all the multivariate second- and third-order moments of the observed scores, 
separately for each group of examinees. The third-order moments are listed, 














FREDERIC M. LORD 97 


to one decimal place accuracy, in Table 2. Since the four tests under study 
are all parallel, it is appropriate to average corresponding moments. These 
averages are indicated in the table by the rows labeled fii: , fai , fs , and 
fi: , Me (these averages are computed from f’s carried to more decimal places 
than those shown in the table). 

TABLE 2 


Unbiased Estimates of the Multivariate Second and Third Moments 
of the Observed Scores for Four Groups of Examinees 














Symbol Group 0 Group H Group M Group L 
Ry 10.391 2.002 4.409 4.932 
a, 14.139 4,119 8.355 8.924 
Rivas -12.4 -3.0 -0.3 +5.0 
on -11.6 -1.8 +1.3 +4,1 
A101 -11.2 -2.4 -0.8 +347 
A110 -11.7 -2.2 -0.3 +45 

Ras -11.728 -2.358 -0.040 +4. 309 
Boois -13.5 -3.5 +0.6 +52 
Po022 -14.0 -3.3 -1.0 +547 
®ox02 ~14.1 4k -0.7 +5.8 
Bo100 -15.2 -4.2 -2.6 +5 .6 
Po001 ~AD69 -4.5 -1.1 +704 
Foo10 ~14.1 -4.3 -2.2 +6.8 
P1002 -12.8 -3.3 -0.0 +45 
F020 ~12.9 ~2.6 -0.2 +4.7 
BL 500 13.2 -3.6 -2.8 +7.6 
P5001 -13.5 -2.8 -0.3 $3.5 
®2010 » 911.8 -2.1 +0.0 +5.8 
85100 722.6 x25 =2.5 +6.3 

to -13.482 -3.422 -1.054 +5743 
POo03 ~17-4 ~6.2 -0.1 +701 
A o930 -20.2 -6.5 oh, +8.8 
Po 500 -17.7 -7.4 -2.3 +11.6 
F000 -17.0 -5.4 -3.0 +11.4 

ft -18.063 -6.386 -2.360 +9.725 











98 PSYCHOMETRIKA 


In the case where four parallel tests have been administered, equations 
(12), (13), and (14), rewritten in terms of sample estimates, become re- 


spectively 


(16) | eee =~; 
(17) M, 200 ek ie RD cies = fa — Ain, 
(18) Risk piss Seg: | =~, — Spar + 2A : 


To simplify notation for the present case where the test forms are parallel, 
the quantities in (16), (17), and (18) will be denoted by M3. , My,_ and 
Mo, respectively. 

If the errors of measurement were distributed independently of true 
score, then M,,, would be zero, except for sampling fluctuations. The meaning 
of M 1,2 is perhaps best understood from the fact that M/,,, is the numerator 
in the formula for the correlation between the true score and the squared 
standard error of measurement. The values of M,,. are shown in line 2 
of Table 3. The third line of the table gives the ratio of M,,, to its standard 


TABLE 3 


Test Statistic and Critical Ratio for Testing the 
Independence and the Normality Hypotheses 








Symbols Group O Group H Group M_ Group L 
My 0 3-75 2.12 3.95 3.99 
fi, 5 -1.754 -1.063 = -1.013 +434 

? 

a, 2 6 
Seemed -5.07 -10.65 -3.79 +5.10 
VVar a, 2 

? 
fi -1.074  -0.838 40.721 41.114 

0,3 
fi, , 
se -2.54 -4,30 41.45 +2,21 

Var a 

0,3 
JB, -.15 -.27 +09 +14 




















FREDERIC M. LORD 99 


error. It is seen that in all four groups the association between true scores 
and errors of measurement is much larger than could be plausibly accounted 
for under the hypotheses of independence and normality. 

M,.; is a measure of the skewness of the error distribution. If the errors 
of measurement were symmetrically distributed, then 17,,, would be zero 
except for sampling fluctuations. Table 3 shows that in three of the four 
groups the skewness of the error distribution differs from zero by more 
than could be plausibly accounted for under the hypotheses of independence 
and normality. The fact that the skewness is not significant in Group M 
is accounted for by the fact that 99 percent of the observed scores in this 
specially selected group fall in the range from 6 to 19, and 98 percent of 
the true scores are estimated to fall in the range from 9 to 17; thus the group 
contains none of the extreme true scores that are accompanied by highly 
skewed distributions of errors of measurement. 

_ Table 3 also presents the values of M,.2. and the measure of skewness 
V/s, = M,.,/M?” . It is seen that the skewness of the errors of measure- 
ment is greatest for the most extreme groups, as would be expected a priori. 

The results of the empirical study are seen to be, in general, incom- 
patible with the hypotheses that the errors of measurement are distributed 
normally and independently of the true scores. The deviation from each 
of these hypotheses seems to be greatest for the most extreme group (Group 
H). It may be assumed, further, that within Group H, the deviations are 
greatest for those examinees having true scores closest to the test ceiling, 
£ = 22, 

The results obtained appear to be generally consistent with the hypothesis 
that for given £, the errors are distributed in such a way that x = & + e 
has a binomial distribution with probability of success §/7, where J is the 
number of items in the test. This hypothesis presumably holds exactly 
only for the case of randomly parallel forms [6, 7], but could be expected 
to hold approximately when the parallel forms are matched rather than 
random, as in the present case. Additional numerical results suggest that 
in the present data the standard error of measurement tends to be slightly 
larger for low true scores than for high true scores. Such a finding would be 
expected when the examinee has a chance to guess correctly whenever he 
does not know the answer to an item. This result would not be expected 
on randomly parallel test forms, but would be expected on carefully matched 
forms, according to the line of reasoning given in ([6], pp. 518-519). 


Unbiased Estimates of Population Moments 


Unbiased sample estimates of the third-order population moments of 
the observed scores are provided by third-order k-statistics ({4], ch. 11). 
These unbiased estimates have up to this point been designated by a with 











100 PSYCHOMETRIKA 


appropriate subscripts; hereafter the symbol Af will in most cases be replaced 
by k to fit in with the general notation for k-statistics. For example, 





N N 
ais oF at Pg 


a=1 


N N 
Kior0 _ Dd Zaow ? 














N a 1 a=1 
(19) tone = ay ee 
0300 ~~ (N a 1)(N ie 2) mr Zav ’ 
N? N . 
Kor20 — (N : 1)(N ait 2) Di Patou ’ 
N? N 
 Brsr0 pa (N oe 1)(N re 2) Di eatetow ° 


In the present study, the four tests for which data were available were 
all parallel tests. Under the assumption of parallelism, the subscripts on 
any » or k may be permuted without changing its value; likewise the sub- 
scripts after the comma on any M. 

With four parallel tests, unbiased estimates of the third-order population 
p’s are 


oa 


1 
fin = 4 (kir10 + Kiso + Kion + Kors); 
(20) 4 An = 1, (ke100 + keo10 + eee + Koorz) » 





[ Bg = ; (kso00 + Koso0 + Kooso + Kooos) » 


all unnecessary zeros having been dropped from the symbols at the left. 
From (12), (13), and (14), unbiased estimates of the M’s are then given 
by (16), (17), and (18). 


Sampling Variances and Covariances 


There remains only the problem of obtaining the sampling variances 
of the M’s in (17) and (18). First of all, sampling variances and covariances 
for the k’s will be needed. Those involving only one or two variables are 
given in [1]. The others required were worked out by the method described 
in ([4], sec. 11.25); the resulting formulas are listed in the appendix for 
future reference. Each formula is a linear function of the population cumu- 
lants (x’s). Other formulas are obtained from these by permutation of sub- 
scripts. 

Next, the null hypotheses of normality and independence were assumed 














FREDERIC M. LORD 101 


(note that although the errors of measurement have been assumed to be 
normally distributed in accordance with the null hypothesis, no assumption 
has been made about the shape of the distribution either of the true scores 
or of the observed scores). Thus all the quantities on the left of (13) and 
(14) vanish, so that usoo = Hoso = Moos = Maio = *** = More = Min - 

A similar set of equations is needed for the higher moments. It will 
be necessary to use the following. 


THEOREM. Given random variables x; (¢ = 1, 2, --- , n) where x; = 
£, + e, for all 1, the e; being distributed normally and independently of each 
other and of the &; , then all multivariate cumulants of the x; are equal to those 
of the &; excepting only that o°, = o;, + 0%, . 


TABLE 4 


Coefficients in the Formulas for the Sampling Variances and Covariances of 
the Third-Order Multivariate k -Statistics of the Observed Scores 
When the Tests Are Parallel and the Errors of Measurement 
Are Normally Distributed, Independently of True Score 











2 
Variances and e tee se ee “i 
Covariances i. ae 





Var ks 
Cov (K599K5)) 
Cov (595k) 5) 
Cov (k 
Cov (k 
Cov (k 


30?“o3) 
3007111) 
300°*oe1) 


Cov (K5000?o211) 
Var Ko) 


Cov (k,, 
Cov (k 


12) 
210?*201) 
Cov (519% 33) 
Cov (Ko19 2199) 
Cov (K5399Ko 19) 
Cov (K1997*1911) 
Cov (k5199?Ko7 11) 
Cov (k5199?* 9901) 
Var Kaui 

Cov («1 430?*o131) 


PRPRPP PRP PRP Pe Pepe Ppp 
MU OFNFNU FFM OOWOUW aw 
YANO AONWOAKHAUU F WOW Dw nAWwWO 
OVUUWUOUWUUWOUWUDUDUOW DDD OO oO 0 
oO #6 © 0 6 6 © 0/0: 010 600 650: 
Ww Oo O:6:6:0 CO MW Df OC. O..0 0 0 6 A 6 
mv ON FNFENM FOF OO NDAD O 
VN AFM FMD ONDOARHROAOCOO 














102 PSYCHOMETRIKA 


The proof of this theorem need not be given here since it follows exactly 
that given in [7] for the univariate case. When the £; are all identical (this 
is the present case where we are dealing with parallel tests), then the multi- 
variate cumulant x, ...; is the same as the univariate cumulant kasg+...+3 
of every ; . This leads to the following. 


~Corotiary. When the conditions of the theorem are satisfied and in addition 
the &; are all identical (t.e., when the x; are all ‘measures of the same true value’), 
then, except for the variances, all the multivariate cumulants of the x; of a given 
order are equal. 


Note that the theorem and corollary do not depend on the assumption that 
different errors of measurement have equal variances. 

The formulas for the sampling variances and covariances of the k’s 
are greatly simplified by use of this corollary together with the assumption 
of parallelism. Each simplified formula is simply a linear function of the 
terms given at the tops of the columns in Table 4; the numerical coefficients 
with which these terms appear in the formula are given in the body of the 
table. The symbol N'*) = N(N — 1)(N — 2). Thus 


TABLE 5 


Coefficients in the Formulas for the Sampling Variances and Covariances 
of the ff and the M (under the same conditions as in Table }) 














2 DS eR a oe 23 

Variances and Ke Ky, Ky) Re N KS N Koha) N Roky N Ria 
Covariances WWI WI FI WGI UI 151 yl3) 

Ver f,,, Sts ser Re 3 9 il) 
Var toy - (32 - 27 81 108 2 8 32 30 )* 
Var a, ~ ( 4 9 27 36 6 0) (e) 18 )* 
Cov (254%) 4 9 27 36 (e) 4 10 10 
Cov (25,85) 4 9 27 36 (e) fe) 18 6 
Cov (@,, 585) ~ (22°. 27 81 108 fe) 18 18 36 )* 
var M, 5 wlio 0 0 a toe its 3 )* 
Var a, o oO ) a Sere 12 -4 





* 
Each integer in parentheses is to be multiplied by the fraction preceding 
the parentheses. 




















FREDERIC M. LORD 103 


Ox4k11 9x3 6NK, 
Po1* se oe 





Cov (ks00 ’ kon) = a + 


Formulas for the sampling variances and covariances of the fi’s are 
readily obtained from those of the k’s. The results are given in the top half 
of Table 5. 

These, finally, are used to produce the formulas for the sampling vari- 
ances of the M’s shown in the last two lines of Table 5. They reduce to a 
remarkably simple form, involving second-degree cumulants only: 


‘i a Nks _ K11) (Ske + 3x11) 








a) Var Mis = “Tow —1(N—2) ? 
v = AN (ko — Ki1)" 
(22) Var Mo; = (NV — 1(N — 2) 


Each sample k is an unbiased estimate of the corresponding x. In large 
samples, the «’s in (21) and (22) may be replaced by an average of all the 
corresponding sample k’s to obtain a sample estimate of the true sampling 
variances. To an adequate approximation the quantities NV — 1 and N — 2 
may be replaced by N, as has been done for the present calculations. 


Appendix 


This appendix gives the formulas for Var k,,, and for all sampling 
covariances between third-order k-statistics involving four variables. All 
formulas for sampling covariances between third-order k-statistics involving 
three or fewer variables can be obtained from those given here by adding 
appropriate subscripts together. 


n=N-—1, 
W = (N — 1)(N —2)/N. 
Var kis, = N7'koo2 + 2 "(Ke20Ko02 + K202Ko20 + Koz2K200) + 27" (KersKor 
+ ki2ik101  K112K110 + Ke10Ko12 1H K201Ko21 + K120K102) 
+ 3n7 Kia os W~"(K200Ko20Ko02 - K200Ko11 + Ko20Ki01 


2 -1 
+ Koo2K110) + 2wW Ki10K101K011 «+ 


-1 -1 
Cov (ke100 ’ Koo12) = N kare + 2” (Aki00.K1111  Kor10K2002 + 2k1010%1102 


+ 2ko101K2011 + Keo10Ko102 + 2k2001K0111 + 2k1002K1110 
1 2 
+ 4K1101K1011) + W (41 010K1001Ko101 + 2k1001Ko110) « 











104 PSYCHOMETRIKA 


Cov (i110 ’ Kos11) = N"ki201 + 1 *(ko200K1021 + Koo20K1201 + Ki100Ko121 
+ Koo11K1210 F Kio10Ko211 + KororK1120 “ Ki001Ko220 
+ 2korr0k1111 + Ki200K0021 + Kio20K0201 + 2ko120K1101 
+ 2ko210K1011 + 3k1110K0111) + W" (Koz00Koo20K1001 
+ Ko200K1010K0011 + Koo20Ko101K1100 F K1100Ko110K0011 
+ kio10Ko110K0101 + Kx001K0110) 
Cov (histo » K2001) = N ksi. + n~"(2ky100K2011 + 2kror0k2101 + Kio01K2110 
+ Ko101K3010 + Koo11Ks100 + 2k2000K1111 + Kso00Ko111 
+ 3k2100K1011 + Sk2010K1101 + 2k2001K1110) + 2W ~*(kso00K1100Ko011 
+ k2000K1010K0101 + K1100K1010K1001) « 
Cov (kir10 ? Koo12) weit N™"kiro2 + n*(kooz0K1102 + kyo10Ko112 + Ko110K1012 
+ 2kyo01Ko121 + 2Wor0xKi021 + 2koo11K1111 F Koo12K1110 
+ Koso2X1020 + Ki002Ko120 + 2koo2K1101 + 4Kko111K1011) 
+ 2W~"(kooz0K1001Ko101 + kio10K0101K0011 + K1001Ko110Ko011) « 
Cov (kiiio 5 Kooos) = No 'ki11s + 32" (kroorKorr2 HF Kororki012 + Koo11K1102 
+ ki101Ko012 + Ki011Ko102 + Ko111K1002) + 6W*k:001Ko101Ko011- 


REFERENCES 


[1] Cook, M. B. Bi-variate k-statistics and cumulants of their joint sampling distribution. 
Biometrika, 1951, 38, 179-195. 

[2] Cramér, H. Mathematical methods of statistics. Princeton: Princeton Univ. Press, 1946. 

[3] Gulliksen, H. Theory of menial tests. New York: Wiley, 1950. 

[4] Kendall, M. G. The advanced theory of statistics. (5th ed.) New York: Hafner, 1952. 
2 vols. 

[5] Lord, F. M. The joint cumulants of true values and errors of measurement. Ann. math. 
Statist., 1959, 30, 1000-1004. 

[6] Lord, F. M. Do tests of the same length have the same standard error of measurement? 
Educ. psychol. Measmt., 1957, 17, 510-521. 

[7] Lord, F. M. Statistical inferences about true scores. Psychometrika, 1959, 24, 1-18. 


Manuscript received 2/6/69 

















BOOK REVIEWS 


Yrs6 AHMAVAARA. On the Unified Factor Theory of Mind. Helsinki: Suomalaisen Tiedea- 
katemia, 1957. Annales Akademiae Scientiarum Fennicae, Ser. B., Vol. 106. Pp. 176. 


In Part I, entitled ‘On the Factorial Description of Mind,’’ the author develops his 
concept of a “Unified Factor Theory”’ and presents a short outline of his earlier ‘‘Trans- 
formation Analysis” [1]. In Part II, entitled ‘‘On the Theory of Abilities,’”’ he uses his trans- 
formation analysis for the comparison of several studies reported in the literature. 

To compare Ahmavaara’s “Unified Factor Theory”’ with similar expositions on the 
logic and philosophical basis of factor analysis would be a rather hopeless undertaking in 
that the number of such presentations is very large. It is the opinion of this reviewer that 
the most careful and comprehensive accounts of the philosophical basis of factor analysis 
remain those contained in the books by L. L. Thurstone [18] and Sir Godfrey Thomson [17]. 
Ahmavaara, following the terminology of Cohen and Nagel [7], places factor analysis in the 
domain of abstractive theories which encompass, among others, the theories of relativity, 
evolution, and quantum theory, in contradistinction to mechanistic theories which include 
field theories, the theory of atomic structure of matter, and current theories of learning. 
The latter are characterized by pictures or visual presentations, whereas the former describe 
relationships without recourse to mechanistic models. The author suggests that factor 
analysis represents the only example in psychology of an abstractive theory and attributes 
misunderstanding in this area to attempts at giving a mechanistic interpretation to the 
factors. He argues that the abstractive theory is a precursor of the mechanistic one which 
may, some time in the future, supersede it. It is not the goal of this abstractive theory of 
factor analysis to find a hypothetical inner mechanism to explain the results, but to identify 
some underlying simple order in terms of the fewest possible number of concepts. 

While the usual textbooks of psychology start, according to the author, with the 
explanation of a mechanistic model, the abstractive theory should begin with a formal 
consideration of the experimental conditions. He lets the registration of some experimental 
fact result in a score @:im, depending on the person (index 7), the trait (index k), and the 
situation (index m). In a graphical representation, these three bases are thought of as dimen- 
sions of infinite extension where, however, only the situation dimension is assumed to repre- 
sent a time-dependent continuum. All qualitative observations must be quantized to yield 
scores, and the author suggests item analysis, scale analysis, and latent structure analysis 
as methods of quantification. He also states that all persons and traits must be numbered 
and arrayed along their respective axes. He then continues to argue that all traits, situations, 
and persons cannot be considered of equal importance in psychology, and that the psycho- 
logical starting point of a unified factor theory is an attempt to discover the “most revealing 
situations, the principal types of personality and traits.’’ These ‘Situation, Person, and 
Trait Factors’ represent an “Information Packet” (of finite extent) with score character- 
istics Ajam, Which serve to characterize any new observation according to the “main type” 
to which it belongs. 

In Chapter 4 of Part I, the author states the goal of his theory, i.e., to find the method 
of analysis by which the Information Packet can be obtained. He argues that the results 
of experiments constitute “highly symbolic descriptions of the object, the rules of this 
symbolization being ... chosen freely sc as to yield the simplest theory to fit the facts.” 
He postulates that this symbolism must be chosen in such a way that the psychological 
scores can be added together to yield new psychological scores. He argues that this principle 
of additivity is logical in view of the experimental techniques employed in psychology; in 
contrast, classical physics employs a multiplicative principle (2 dynes + 3 cm. is meaning- 
less, but 2 dynes X 3 cm = 6 dyne-cm or ergs is meaningful), and the author compares the 


105 








106 PSYCHOMETRIKA 


required additivity in psychology with the additivity of states of an atom. The additivity 
principle gives rise to a linear model of factor analysis, and the author brands nonlinear 
factor analysis as a misconception. He negates the superiority of a nonlinear model and 
considers it merely as an alternative. This argument, which is advanced repeatedly, becomes 
somewhat weakened by such statements as (pp. 38-39) 


“Nonlinear relationships between persons... are not necessarily nonlinear from 
the point of view of traits, and vice versa. Thus we can, by a change of the basis of 
analysis . . . satisfy the requirement of linear relationships in numerous ways... 


“So far, at least, there has been no real need for nonlinear factor analysis, and I 
suppose that there never will be’’ (italics supplied). 


It is interesting to note how carefully the author avoids the logical consequence of this argu- 
ment, namely, that some method of normalizing or scaling must precede the analysis. The 
simple way of normalizing experimental data, by which Laplace, according to G. Darmois’s 
[8] beautiful formulation, “a linéarisé la théorie des erreurs,” has been frequently used by 
factor analysts ever since the time of Thurstone’s famous box problem, which would be 
inextricable without this device. But Laplace, Thurstone, and Darmois advocate these tech- 
niques clearly in order to represent the complex nonlinear relationship by a linear approxi- 
mation, whereas Ahmavaara expressly disavows the existence of nonlinear relations. 

As Ahmavaara starts an attempt to give mathematical expression to his logical postu- 
lates, the monograph becomes a nightmare. He starts out with the massive contention (p. 25) 


“Tf we add together two vectors, the result is again a vector (of the ‘linear vector 
field’ spanned by those two vectors). The vector is the most general mathematical 
symbol which satisfies this requirement.” 


(Even a beginning student of linear associative algebra will cringe at this.) 
Thus, as a “direct consequence of the additivity postulate” he states that 


“the theory of mind, expressed in terms of the relations between psychological 
scores, can be constructed as isomorphic with the mathematical theory of linear 
vector fields.”’ 


He discusses three different linear vector fields; one each for persons, traits, and situations. 
Each may be considered on the basis of the other two. According to the author, these are 
to be “independent models of mind which complement each other.’’ He then introduces the 
scalar product of two vectors (p. 26) as an “index intended to express the smallness of the 
angle between the vectors in question.’”’ This statement becomes grave when we recognize 
the author’s confusion between component and factor analysis: 


“The use of communalities or the use of (11) (factor analysis model) instead of (9) 
(component analysis model) is promoted by an endeavor to reduce the number of 
factors to a minimum. This is not, however, a necessary requirement imposed on 
the vector model, but only a practical convention” (p. 33). 


It gets more serious when we note, on the same page, that the desire for a simple structure 
is “prompted by the same endeavor to reduce the number of factors which dictated the 
use of communalities;’”’ and it becomes entirely untenable when we read the surprising 
argument on page 41. 


“Factor Analysis, as it is represented in this text, is accordingly performed from 
covariances rather than from correlations. It is a well known fact [sic!] that this is 
the only way to create a consistent factor theory. The usual factorial procedure, 




















BOOK REVIEWS 107 


starting from correlations, is only to be considered a practical approximation, 
which must be replaced before long by an analysis of covariance.” 


Just how consistent can a theory be if the linear vector field representing mind can be 
changed at will by changing units and, for instance, giving a boy a score of 10 points for a 
correct answer instead of 2 points? In vectors of such varying lengths as the author postu- 
lates, even in his illustrations, the relationship between scalar products and angular separa- 
tion would not even suffice as a first approximation. 

This leads us to the ‘‘Transformation Analysis,” the central feature of the monograph, 
which is employed throughout Part II. The author repeatedly and emphatically rejects the 
criticisms of statisticians and, in fact, the legitimacy of statistics as a tool at this stage of 
model building; he refers to the experimental group as an experimental population. But 
then he must admit (p. 34) that 


“every experimental factor analysis of traits, based on some limited number of sub- 
jects, establishes its own vector model, which is different from all other models 
established by other analyses of traits, with other persons as subjects. If there were 
no connections between these factor analyses with different subjects, we could not 
arrive at any general factor theory of mind.” 


The entire argument here and in the sequel is a description of samples with careful substitu- 
tion of the words ‘experimental population’’ to avoid recognition of the statistical nature of 
this argument. In the development of the transformation analysis on pp. 35-36 the author 
assumes two experimental populations which give rise to two factor matrices F, and F; and 
derives, from his principle of additivity, that (p. 35) 


“the linear relationships between the trait vectors are invariant under transforma- 
tion from the vector field of the first study to the vector field of the second study.” 


The proof of this contention is very interesting. From his formula (16) on page 36, F2 = FL, 
he obtains a unique expression for Z in formula (17), i.e., 2 = (F:'F1)-1'F; . Of course, he 
realizes that one cannot proceed from (17) to (16) and therefore plots the elements of F2 
versus those of F,L, hoping that the points will be on an x = y line. Naturally, and to no- 
body’s surprise, the plots on pages 54, 57, and 60, for instance, are not very convincing and 
dozens of other plots are not shown. How the author justifies this solution is beyond the 
mathematical comprehension of the reviewer. Does he assume that F:\L = F.+E (i.e., all 
error in the second study) and minimize the sum of squares of elements in E? That would 
yield the stated solution for Z. But what if he exchanges the numbers 1 and 2? Apart from 
the serious objections to this kind of least-squares approach, what became of the transfor- 
mation analysis, : 


“the latter being the consistent comparison method of factor studies,” (p. 36) 


if a mere change of subscripts produces two different results? 

Part II of the monograph applies this transformation analysis to 18 comparisons in 
the Reasoning-Closure Domain, six comparisons in the Verbal Domain, four in the Mechan- 
ical Domain, and three in the Musical Domain. A short interpretation of results based upon 
the comparison matrix is given for each pair of studies. Combination of results is attempted 
by using the sum of invariance values as a criterion. The “objective comparison” (p. 48) 
involves quite a number of arbitrary index numbers whose properties are not discussed. 

The author makes brief mention of other techniques and examples of comparison of 
factor studies, especially the method proposed by Tucker [19] and the classification mono- 
graph by French [9]. To these methods and catalogues he contrasts his transformation 
analysis as the objective and consistent one. 











108 PSYCHOMETRIKA 


Ahmavaara compares factor analysis to differential calculus and contrasts it with 
statistics. The first two pages of his introduction are directed strongly against statistics and 
statisticians. He claims on page 12 that “‘the validity of the whole factor theory has been 
questioned by statisticians;” the reviewer, who is a statistician, must take exception to that 
statement. If we disregard statements made by some statisticians who do not condescend 
to even an attempt at understanding the models of factor analysis (e.g., one discussion 
speaker in [13]) or who, like Ahmavaara, discuss two different types of analysis, viz., factor 
and component analysis, as if they were one and the same [11], the statisticians have been 
rather cooperative and enthusiastic about factor analysis. Shortly after psychologists like 
Sir Godfrey Thomson and L. L. Thurstone had explained the model, statisticians (Lawley 
[14], Bartlett [6]) produced solutions which, alas, are exactly obtainable only with the help 
of medium-sized electronic machinery. A psychologist who has understood the basis of fac- 
torial logic will probably profit by reading about the statistical and mathematical methods 
which are presented in a variety of studies [4, 5, 12, 15]. A problem which still awaits solu- 
tion, i.e., that of an appropriate mathematical and/or statistical analysis of the “vector 
field of persons based upon traits,’’ has been attacked in the last part of [10], and some opera- 
tional methods were suggested by Stephenson and others. 

The problem of comparing two factor studies by statistical techniques is essentially 
solved. If we take a covariance matrix as an approximation of the correlation matrix (and 
not vice versa, as suggested by the author) the test is trivial (see, e.g., [3] and [16]). The 
derivation of a test for the equality of two correlation matrices by the likelihood ratio 
method is of the order of an exercise for a student who has studied multivariate analysis 
(in a book such as [3]). Testing equality of two factor matrices may be somewhat more 
difficult, but is still amenable to treatment by a minor extension of Lawley’s approach. The 
maximum determinant solution which L. L. Thurstone suggested in 1953 on intuitive 
grounds, and which was subsequently described in [5] and [12], leads to results identical 
with the maximum likelihood solution, and is particularly easy to handle for the comparison 
of two factorial studies. Statisticians do have methods of solution, and also available is the 
computational equipment to find them in a relatively short time. Many statisticians appre- 
ciate the ideas of factor analysis and like to help. It is sometimes difficult, however, to allay 
suspicions of skeptics if examples of mathematical legerdemain appear in the psychological 
literature. 

This reviewer enjoyed reading the monograph, and while he is doubtful about the 
value of the author’s transformation analysis, he is impressed by the clarity of the logical 
presentations of factor analysis as an abstractive theory. He preferred to view factor analysis 
as a method to identify principles of classification, and stands corrected by the author who 
would certainly regard even this unpretentious interpretation as mechanistic. While the 
reviewer would dissuade against use of the author’s mathematical techniques, he would 
strongly recommend the monograph to philosophers, psychologists, and social scientists. 
He would also encourage the author to excerpt the monograph, leave out all mathematical 
formulation, and present such an article to a more general group of readers including, by 
all means, statisticians. 

Rotr BARGMANN 
Virginia Polytechnic Institute 


REFERENCES 


[1] Ahmavaara, Y. Transformation analysis of factorial data. Helsinki: Suomalainen Tiede- 
akatemian, 1954. 

[2] Ahmavaara, Y. and Markkanen, T. The unified factor model. Helsinki: Finnish Founda- 
tion for Alcohol Studies, No. 7,1958. 

[3] Anderson, T. W. Introduction to multivariate statistical analysis. New York: Wiley,1958. 














BOOK REVIEWS 109 


[4] Anderson, T. W. and Rubin, H. Statistical inference in factor analysis. In Proc. of the 
Third Berkeley Symp. on math. Statist. and Probability. Vol. 5. Berkeley: Univ. Cali- 
fornia Press, 1956. Pp. 111-150. 

[5] Bargmann, R. E. A study of independence and dependence in multivariate normal 
analysis. Univ. North Carolina, Inst. Statist. Mimeo. Ser. No. 186, 1957. 

[6] Bartlett, M.S. Tests of significance in factor analysis. Brit. J. Psychol., Statist. Sec., 
1950, 3, 77-85. 

[7] Cohen, M. R. and Nagel, E. An introduction to logic and scientific method. (3rd ed.) 
London: Harcourt Brace, 1951. 

[8] Darmois, G. Observations théoriques sur l’analyse factorielle linéaire et générale. Col- 
loques Intern. du Centre de la Rech. Scient., 1955, 58, 295-317. 

[9] French, J. W. The description of aptitude and achievement tests in terms of rotated 
factors. Psychometric Monogr. No. &, Chicago: Univ. Chicago Press, 1951. 

{10] Hotelling, H. Analysis of a complex of statistical variables into principal components 
(second part). J. educ. Psychol., 1933, 24, 498-520. 

(11) Hotelling, H. Relation of the newer multivariate statistical methods to factor analysis. 
Colloques Intern. du Centre de la Rech. Scient., 1955, 58, 107-125. 

[12] Howe, W. G. Some contributions to factor analysis. Oak Ridge National Laboratory. 
ORNL No. 1919, 1955. 

[13] Kendall, M. G. and Smith, B. Factor analysis. J. Roy. statist. Soc. Ser. B, 1950, 12, 
60-94. 

[14] Lawley, D. N. The estimation of factor loadings by the method of maximum likelihood. 
Proc. Roy. Soc. Edinburgh, 1940, 60, 64-82. 

[15] Rao, C. R. Estimation and tests of significance in factor analysis. Psychometrika, 1955, 
20, 93-111. 

[16] Roy, S. N. Some aspects of multivariate analysis. New York: Wiley, 1957. 

[17] Thomson, Sir Godfrey. The factorial analysis of human ability. Boston: Houghton 
Mifflin, 1939. 

[18] Thurstone, L. L. Muléiple-factor analysis. Chicago: Univ. Chicago Press, 1940. 

{19] Tucker, L. R. A method of synthesis of factor analysis studies. Personnel Research 
Section, Rep. No. 984, 1951. 


Y. AHMAVAARA AND T. MARKKANEN. The Unified Factor Model. Its Position in Psychometric 
Theory and Application to Sociological Alcohol Study. Vol. 7. Helsinki: The Finnish 
Foundation for Alcohol Studies, 1958. Pp. 187. Stockholm: Almqvist and Wiksell, 
distributors. 


This monograph consists of two parts. The first, by Ahmavaara, is entitled ‘‘A Treatise 
on Psychometric Models,’’ and the second, authored by Markkanen, “On the Sociological 
Theory of Alcohol in Terms of the Unified Factor Analysis Model.” 

The title of the first part is misleading. It is not at all a treatise of psychometric models 
such as may be found in [2, 3, 10, 12]. Instead, and much to the annoyance of the reader, it 
represents an exceedingly poor presentation of some selected techniques of scaling with 
severe criticisms attached, which verge on the polemic (pages 40 to 47 on Guttman’s prin- 
cipal components), and a eulogy on the first author’s own methods. The reviewer, who 
enjoyed the author’s earlier On the Unified Factor Theory of Mind because of its interesting 
and compelling logical development, and in spite of its grotesque mathematics (transfor- 
mation analysis), was not at all impressed by the present volume. A glance at some of the 
chapter headings and statements, 


“Sec. 5 A Critical Examination of Guttman’s Principal Components” 
“Sec. 7 Erroneous Imitations of...” 








110 PSYCHOMETRIKA 


‘Sec. 9 Critical Examination of the Radex System” 

“A requirement of such a kind is arbitrary to the highest degree” (p. 46) 
“Mathematical artifacts” (p. 46) 

“Jacking empirical foundation”’ (p. 46) 

“Moreover, there occur in the afore-mentioned work (Mathematical Thinking in 
the Social Sciences) such statements of fundamental nature as are to be regarded as 
simply erroneous...” (p. 56) 

“there are many equations suggested by Rashevsky which are devoid of any other 
significance ...”’ (p. 59) 

“TI protest...’ (p. 73) 

etc., etc., 


and, on the other hand, where the author refers to his own methods and proposals, 


“Our approach to the problem of mathematical models in sociology differs from 
that of ‘Mathematical Thinking...’ in that ours tends to be a systematic one.” 


(p. 15) 
“A very fine example of the application of discriminance analysis...” (p. 27) 
“Tt has recently been shown in an objective and numerical manner .. .”’ (the au- 


thor speaks, of all things, about his transformation analysis) (p. 62) 

“What essentially makes factor analysis a unified model is its third step, the trans- 
formation.” (p. 80) 

“A synthesis of different factor analyses could be secured for the first time in an 
objective manner.” (p. 84) 

etc., etc., 


should discourage the reader from paying any further attention to the first author’s vain 
suggestions. 

Ahmavaara, in this ‘‘Treatise,’’ discusses ratio scales and ordinal scales; his failure to 
mention the nominal and, above all, the interval scales at all [9] makes the discussion rather 
difficult to follow. Especially in the attack upon Rashevsky (p. 58) and the use of differential 
equations in the social sciences, Ahmavaara’s failure to mention the interval scale defeats 
his own purpose. 

After a short description of Guttman’s scale, the author suggests the use of discrimi- 
nance analysis (with, incidentally, an impossible graph on page 25). The idea is not much 
different from that used in one of the oldest types of scales [5] and, of course, no properties 
of the discriminant curve are studied. How necessary such a study is may be illustrated by 
the loose application of Guttman’s reproducibility index or Menzel’s coefficient of scal- 
ability [6] which depend on the number of items. The gain in reproducibility was reported 
as one of the most important features of the H-technique [11]. An elementary statistical 
investigation similar to that of Festinger [1, 4] shows the spuriousness of such a gain. The 
creation of suitable indices to prove a point is one of the most prevalent types of statistical 
lies, and we will have to await further study of the properties of Ahmavaara’s discriminance 
curves before using them. 

The initial description of Lazarsfeld’s measurement is a little sketchy but quite well 
done. Alas, there follow some statements regarding the applicability and restrictions of 
Lazarsfeld’s measurement which show that the author is even less familiar with the prin- 
ciples than the average reader. One of his criticisms of Lazarsfeld’s latent class analysis 
is that 


“The Guttman measurement provides one with better possibilities as regards math- 
ematical model building in psychology and sociology”’ (p. 39). 














BOOK REVIEWS 111 


Considerable advances in the analysis of results based upon nominal scales [7, 8] are appar- 
ently unknown to the author. These methods deserve closer study before categorical state- 
ments are made. 

The author does not distinguish between maximum probability and increasing prob- 
ability trace lines, and this fact, along with his failure to mention interval and nominal 
scales, makes him come to the conclusion that many things need to be done 


“before it is possible to say anything definite about the usefulness of the Lazarsfeld 
measurement in psychological and sociological research”’ (p. 25). 


Since the author distinguishes only ratio scales and ordinal scales, he suggests that 
“Guttman Measurement’ is the only useful one in psychology and sociology and maintains 
that this is the type of scale which has been used intuitively in mental test theory. The 
author rejects the practice of standardization (p. 37) and is extremely critical of normal- 
ization (p. 82). 

Ahmavaara’s description of Guttman’s principal components is no description at all. 
It is a highly loaded criticism, sometimes advanced with polemic technicalities. To be sure, 
the name intensity function, which Guttman used for the description of the second eigen- 
vector, may be a little vague and poorly chosen. However, Ahmavaara’s disproof by 
semantics, in that he disclaims connection between a subject’s expressed intensity of opinion 
and Guttman’s second eigenvector, is reminiscent of political rather than scientific con- 
troversy. Ahmavaara is very critical of the meaning of the variance ratio which Guttman 
minimizes; he directs his criticism not against the ideas, which are not presented here 
anyway, but against the final formula. Ahmavaara’s counter-example is, to quote the author 
on page 46, “lacking any empirical foundation.’”’? One may not be too strongly impressed 
by Guttman’s principal components, but the language and polemic of this criticism is most 
certainly unjustified. 

In his section on the theory of mathematical model building, the author presents at 
some length the distinctions between Cartesian and Hilbertian analysis, and restricts the 
former to ratio measurements. He argues that ordinal scales find their mathematical ex- 
pression in terms of the dependence between vectors in the function space, and hence 
concludes that factor analysis is the appropriate technique to be used with this kind of 
measurement. As a matter of fact he states that (p. 54) 


“Factor analysis is, indeed, but a certain simple instance of the general Hilbertian 
analysis.”’ 


In two theorems (p. 56 and p. 59) he states the restriction of Cartesian analysis to ratio 
measurement and says that the Hilbertian type of analysis can be applied also to scales 
on which only the order is determined. He has to admit (p. 61) that the metric has some 
effect upon the results of Hilbertian analysis but states that 


“whatever conventions we may make concerning the metrics of the different scales, 
the Hilbertian vector model may always tell us something of value about the 
mutual relationships of the functions” (p. 62). 


The reviewer would like to remind the author that the correlation between x and z?, two 
collinear variables, is zero if x is standard normal, but this argument may be considered un- 
justified in that xz? does not monotonically vary with x. However, let x be positive and 
distributed as x? with one degree of freedom. In this case z? is certainly monotonic with x 
and the correlation turns out to be 31/3, corresponding to an angular separation of 30 degrees 
for two vectors which, before the conventions were made on the metric, were collinear. This does 
tell us something of value; it tells us that some process, like normalization, is very important 
before we attempt an analysis based upon correlations. 








112 PSYCHOMETRIKA 


It is indicative of the author’s prejudice that he describes Thurstone’s box problem 
without mentioning the fact that all measurements were normalized before correlations 
were computed. Failure to do so leads to serious distortions of the nonlinear combinations. 

The author’s description of Guttman’s simplex, radex, and circumplex is critical as 
indicated by the heading. Since Ahmavaara’s treatise contains not a single constructive 
idea in this respect, his criticism lacks authority; however, the point which Ahmavaara 
raises regarding the indeterminancy of simplex or circumplex structures, when several of 
these are present, has received and may deserve further study. 

In Chapter IV, Ahmavaara repeats, in abbreviated form, his description of factor 
analysis as an abstractive theory. This was quite well done in his earlier monograph On the 
Unified Factor Theory of Mind and discussed in an earlier review. The mathematics used 
by Ahmavaara in his transformation analysis is beyond the comprehension of this reviewer. 
How, from the fact implied by equation (43) (p. 82) that AjAiZ = A;Fs, he concludes any 
similarity between the matrices AiL and F: (both are rectangular) is incomprehensible. I 
would suggest that the author try the column vector with elements (.3, .5) for A: and (.8, .2) 
for F2. Here, Z is the scalar 1 and, if .3 seems too similar to .8, he may try (.9, .14) which, 
again, by the author’s reasoning, is similar to A:. Did he, perchance, cancel A; on the left 
and right side? 

We read in the summary of Part I that this treatise has been a “‘systematic presenta- 
tion of the theoretical thinking of the ‘Finnish school.’ ”’ This reviewer suggests that the 
author gather some more experience with existing methods and literature before aspiring 
to the establishment of a new school. 

In part II, T. Markkanen presents the results of a very extensive analysis of data 
obtained by Dr. Pekka Kuusi on alcohol sales experiments in rural Finland. Dr. Kuusi, 
whose original work was not available to the reviewer, was interested in studying the 
changes in attitude due to the opening of alcohol stores in several rural communities. Ten 
different alcoholic beverages were listed, some available on the legal market, some illegal, 
and subjects were asked to state how many days ago they last drank each of these beverages. 
The results, coded (but not scaled) on a nine-point scale, constitute variates 1 to 10. Variates 
11-15 deal with general questions related to drinking (frequency, opinion, etc.), and variates 
16-25 represent other activities and concomitant information. 

The first factor analysis was made for the general group which (see p. 108) represents 
all the interviewed persons (NV = 293). The primary factor matrix after rotation is recorded 
and represents, according to the author, a satisfactory simple structure. The reviewer does 
not quite agree; with 25 variables and 9 factors, a total of 13 zero loadings (= .10) can be 
obtained in a hyperplane by a pure 50-50 chance, in the absence of any concentration. 
Only one of the factors (5) shows significant over-determination. This does not mean that 
the factors are spurious; as a matter of fact they show a fairly convincing high-low tendency. 
It only means that the authors may consider using some more effective rotation method to 
clean up and clearly define their structure. The method they used (Ahmavaara’s cosine 
method ) seems to be wanting in this respect. The authors report and interpret the following 
factors: (1) Physical and Mental Activeness, (2) Social Control, (3) Pastime and Passive 
Enjoyment, (4) Asocial Drinking (preference of illegal beverages), (5) Religion (but un- 
associated with the rest), (6) Form of Manifestation of Drunkenness, (7) Underdeveloped 
Drinking, (8) Attitude and Opinion, and (9) Legal Drinking. 

There follows the transformation analysis of the two groups, and the agreement 
between the two patterns (graph on page 116) looks quite extraordinary. Equally extra- 
ordinary, however, is the agreement between the two correlation matrices (pp. 178 and 180). 
Most extraordinary, alas, is the composition of the two groups, for one of them (NV = 254) 
represents the users of alcohol, whereas the other represents the general group (VN = 293). 
Thus, the two groups which are compared in the “consistent and unique” transformation 
analysis, have all 25 responses of 254 subjects in common and differ only in that the general 














BOOK REVIEWS 113 


group has 39 subjects added to it. A beginning student in some scientific discipline may be 
forgiven for making this kind of error of comparison. It is less easy to be tolerant of the 
promoters of new, systematic, scientific theories. 

The authors make more comparisons, on the basis of age, employment, etc.; as an 
index of comparison they use the mean score on each factor. It apparently does not occur 
to them that a straightforward comparison of the results of each variable would be much 
easier and considerably more meaningful. Perhaps they might even condescend to use a 
t, F, or multiple-range test to show whether such group differences are significant. 

In conclusion it must be said that this monograph places the users and defenders of 
factor analysis in an embarrassing position. Theirs is a mathematical model, well related to 
the real world, soundly formulated and, for the last 25 years, extensively studied. The results 
of many demonstration studies, starting with Thurstone’s box problem, constitute strong 
evidence in favor of the method. The statistical methods connected with factor analysis 
have been well developed and known for more than 25 years—only the exact numerical 
solutions are quite hard to find, and are only now feasible with the aid of electronic machin- 
ery. But how can anyone convince skeptics of the usefulness of factor analysis if authors, 
like the present ones, make a travesty of it? 

Rotr BARGMANN 
Virginia Polytechnic Institute 


REFERENCES 


[1] Festinger, L. The treatment of qualitative data by scale analysis. Psychol. Bull., 1947, 
44, 149-161. 

[2] Green, B. F. Attitude measurement. In G. Lindzey (Ed.), Handbook of social psychology. 
Cambridge, Mass.: Addison Wesley, 1954. Pp. 335-369. 

[3] Guilford, J. P. Psychometric methods. (2nd ed.) New York: McGraw-Hill, 1954. 

[4] Guttman, L. On Festinger’s evaluation of scale analysis. Psychol. Bull., 1947, 44, 
451-465. 

[5] Likert, R. A technique for the measurement of attitude scales. Arch. Psychol., 1932, 
No. 140. 

[6] Menzel, H. A new coefficient for scalogram analysis. Publ. Opin. Quart., 1953, 17, 
268-280. 

[7] Mitra, S. K. Contributions to the statistical analysis of categorical data. Univ. North 
Carolina, Inst. Statist. Mimeo. Ser. No. 142, 1955. 

[8] Roy, S. N. and Kastenbaum, M. A generalization of analysis of variance and multi- 
variate analysis to data based on frequencies in qualitative categories or class intervals. 
Univ. North Carolina, Inst. Statist. Mimeo. Ser. No. 131, 1955. 

[9] Stevens, S. S. On the theory of scales of measurement. Science, 1946, 103, 670-680. 

[10] Stouffer, S. A., Guttman, L., Suchman, E. A., and Lazarsfeld, P. F. Measurement and 
prediction. Studies in social psychology during World War II, Vol. 4. Princeton: Prince- 
ton Univ. Press, 1950. 

[11] Stouffer, S. A., Borgatta, E. F., Hays, D. G., and Henry, A. F. A technique for im- 
proving cumulative scales. Publ. Opin. Quart., 1952, 16, 273-291. 

{12] Torgerson, W. 8. Theory and methods of scaling. New York: Wiley, 1958. 


Sotomon Kutupack. Information Theory and Statistics. New York: John Wiley & Sons’ 
Inc., 1959. Pp. xvii + 395. 


Information theory may well be a blood relative of the law of large numbers. At 
least the famous coding theorem and the famous law seem to share a direct interest in 
estimation processes based on fluctuating observations, and both settle the estimation 











114 PSYCHOMETRIKA 


problem by the simple expedient of taking a nearly infinite sample. So it is easy to think 
of both as bypassing statistics. 

What then are we to make of the provocative title of this book? Information theory 
seems to mean many different things, and the author is apparently not talking about the 
coding theorem or about channel cpacity. In fact, a statistician like Kullback cannot 
get himself into a lather of excitement over the main results of the theory, reflecting as 
they do a type of asymptotic process already familiar in statements like the law of large 
numbers. On the contrary, the part of information theory that intrigues him is the 
arithmetic, the p log p formula. 

The book is best described as a treatise on the inequality 


~ pi log” > 0, 
Pu 


where pi and p2 are any two probability distributions over the same set of ‘categories. 
The inequality is easily proved, and it immediately generates nearly all the attractive 
features of information measures. For example, only a few additional steps are required 
to show that information is maximum when the values of pz are all equal, or that con- 
ditional probabilities yield a smaller amount of information than marginals, or that trans- 
mitted information is always equal to or greater than zero, and so on. Actually Kullback’s 
statement of the inequality is more general, and the reader will have to pay for the gener- 
ality with increased effort, but the many interesting and occasionally remarkable properties 
of the inequality put the price within reason. 

In any event Kullback converts the formula to legitimate statistics by considering 
p2 as a probability that emerges from a “null’’ hypothesis, and p, as an analogous probability 
based on an alternative hypothesis. When sample estimates are substituted for p: and pe, 
the formula becomes a random variable and its size measures the divergence of the two 
hypotheses as seen in the data. This is a picture of divergence almost identical to the one 
given by the likelihood ratio, and Kullback demonstrates that information measures in 
the form of the inequality are in fact negative logs of likelihood ratios. This is not a new 
idea but the detail of Kullback’s treatment surpasses the earlier literature on the subject 
by a wide margin. 

The idea is carried through a succession of null hypothesis tests based on several 
different assumed populations extending from the simple binomial to the multivariate 
normal. These rubrics, tests and populations, furnish the structure for much of the writing 
as Kullback analyzes the divergence formula under each heading. Perhaps the most inter- 
esting set of tests is found in the chapter on contingency tables where the asymptotic 
chi-square distribution of the likelihood ratio is linked to a surprising array of contingency 
tests. It is worth noting that likelihood ratio tests are essentially equivalent to the more 
familiar chi-square tests. The only important difference is that, with a suitable table of 
n log n, the likelihood ratios are much easier and faster to compute. Kullback provides 
an excellent table in an appendix. 


Columbia University Wiiuram J. McGiLu 


R. D. Luce anp H. Ratrra. Games and Decisions. New York: John Wiley and Sons, 1957. 
Pp. 509. 


The preface of this book would serve as an excellent review. In three printed pages 
the authors set out their scope and objectives. The reviewer would need only to comment 
that they have fully achieved their goals. 

The purpose of the book is to communicate the central ideas and results of game 














BOOK REVIEWS 115 


theory and related decision-making models in such form as to minimize the mathematical 
prerequisites. In principle the main test is mathematically self-contained and no specific 
mathematical preparation is assumed. In the authors’ words, “neither the calculus nor 
matrix algebra as such are required, but neither will hinder, for probably the most im- 
portant prerequisite is that ill-defined quality: mathematical sophistication.’”’? However, 
the final quarter of the book is devoted to mathematical appendices which require con- 
siderably more mathematical knowledge. 

The book might well have been subtitled ‘15 years after’’ since it provides the first 
general sequel to the von Neumann-Morgenstern treatise (Theory of Games and Economic 
Behavior, Princeton: Princeton University Press, 1944, 1947). The emphasis is almost 
totally on the concepts of game theory from the point of view of their appropriateness in 
social science contexts. Although-little attention is given to the mathematical details of 
solutions to specific games, the appendices and references to the bibliography provide an 
excellent guide to a solid and complete mathematical treatment of game theory. The 
authors have not sacrificed preciseness in the interest of easy reading. If a concept is 
difficult they carefully introduce it with well-chosen illustrative examples and instructive 
description, then provide a full formal treatment. 

The first three chapters provide a general introduction to the theory of games in- 
cluding utility theory. Chapter 4 treats two-person, zero-sum games. Chapters 5 and 6 
treat two-person, nonzero-sum games and present concepts developed in an attempt to 
meet some of the deficiencies in the von Neumann-Morgenstern theory, and Chapters 
7-12 treat n-person games beginning with the von Neumann-Morgenstern theory and 
reaching into many newer developments. The last two chapters, 13 and 14, treat individual 
and group decision making and again report progress largely since von Neumann-Morgen- 
stern. : 

Although the book is directed primarily toward the general scientific reader and is 
colored by a social science point of view, its comprehensive and critical review of game 
theory makes it required reading for any person doing mathematical research in the field. 

The exposition is uniformly excellent. The careful interplay among (1) illustrative 
and provocative examples, (2) instructive analyses, and (3) formal developments makes 
for easy reading. The book is well adapted (1) for general reading, (2) for reference, and 
(3) for seminar study. The reviewer’s main regret is the absence of exercises; if these were 
available the book would make a fine text for an introductory course in game theory directed 
toward students who have had an introduction to mathematics at least equivalent to the 
Social Science Research Council sponsored summer institutes of 1953 and 1955 or to one 
of the one-year courses in mathematics for social scientists such as those initiated at 
Michigan and Illinois in 1952 and which have since spread across the nation. The authors 
indicate several groupings of chapters to make-the book most useful to various classes 
of readers, 

In the reviewer’s judgment Games and Decisions is and will remain for a long time 
the definitive work in the conceptual side of game theory; every quantitatively oriented 
social scientist could profit by reading it carefully and having it available on his working 
bookshelf. 


University of Michigan R. M. Toran 


Raymonp B, Catre.u. Personality and Motivation: Structure and Measurement. New York: 
World Book Company, 1957. Pp. xxiv + 948. 


This book, according to the author, is a progress report on factor-analytic research 
on basic personality dimensions. It is intended chiefly for applied psychologists, university 








116 PSYCHOMETRIKA 


teachers, and students with a sound grasp of statistics and principles of psychological 
measurement. The book is arranged in six parts, which range from some fairly elementary 
concepts of measurement, through measurement techniques as applied specifically to the 
area of personality, to theory and findings with respect to personality and motivation. 
Twelve appendices cover various subjects more or less peripheral to the book itself. A 
bibliography listing more than 700 references and a glossary defining nearly 500 terms 
complete the book. It is recommended that the latter be studied thoroughly before the 
book is read since the author is quite adept at using familiar terms in unfamiliar ways 
and has developed many new descriptive words for his factors and techniques. 

Part I, Basic Principles in Personality Research, sets the stage for the remainder 
of the book by presenting reasons why measurement and theory must proceed hand-in- 
hand, by describing the three media of personality observation (observation by others 
through use of rating scales; observation by the self through use of questionnaires; and 
observation by use of objective tests); and by a rather thorough discussion of the role 
of factor analysis. The author’s customary distinction between clusters (which he states 
to be surface traits) and factors (which he states to be source traits) is elaborated as is 
his contention that factors represent measures of the underlying dynamics of personality 
which result in observed clusters of manifest behaviors which intercorrelate highly. 

In Part II, findings from a number of factor analyses of personality measures based 
on ratings, questionnaires, and objective tests are summarized. Part IV presents similar 
findings in the areas of attitude and motivation, and Part V contains the results of several 
factor analyses of personality change. Interpretations of and speculations concerning 
these factors are present in great abundance. However, the book itself does not present 
sufficient data concerning factor loadings, correlations between factors, and stability of 
findings across samples to permit evaluation of these interpretations or of the factor analyses 
themselves without a tremendous amount of “library research’”’ on the cited references. 

Part III continues the discussion of measurement theory commenced in Part I, but 
at a somewhat more sophisticated level. Of special interest in this section are results of 
second-order factor analyses of the author’s primary factors. These second-order factors 
are practically orthogonal, in contrast to the substantial correlations frequently found 
among Cattell’s primaries, and appear to be similar both in meaning and number to the 
first-order factors found by other investigators. 

The final part is concerned with the application of personality measurement in the 
clinical, educational, and industrial fields. The use of the “specification equation,” with 
its attendant factor profiles for people and requirement profiles for job or classroom situa- 
tions, is discussed at length. A number of common clinical and industrial measurement 
situations are touched upon and a test battery (usually one published by the Cattell’s 
Institute for Personality and Ability Testing) is recommended for each. 

Although presented as a progress report, this book appears to be basically an ex- 
position of Cattell’s theories of personality, which are considerably in advance of the 
experimental data necessary to confirm them. Viewed as such it contains a wealth of ideas 
and suggestions for research, but does not appear suitable as a textbook, at least at the 
undergraduate or first-year graduate student level. 


Personnel Laboratory Ernest C, Turrs 
Lackland Air Force Base, Texas 

















BOOK REVIEWS 117 


D. R. Cox. Planning of Experiments. New York: John Wiley and Sons, Inc.; London: 
Chapman and Hall, Ltd., 1958. Pp. vii + 308. 


A. E. Maxwe.u. Experimental Design in Psychology and the Medical Sciences. London: 
Methuen and Co., Ltd.; New York: John Wiley and Sons, Inc., 1958. Pp. 147. 


Each of these books bears the Wiley imprint, concerns the conduct of experiments, 
and was written at the University of London. (David Cox, a statistician, is reader in 
statistics in Birkbeck College; Maxwell, a psychologist, is lecturer in statistics in the Insti- 
tute of Psychiatry, Maudesley Hospital.) Aside from these superficial similarities, there 
are considerable differences between the two volumes. 

Cox’s Planning of Experiments is significantly titled; this excellent book concerns 
the broader aspects of design—and not analysis—of experiments. Further, it adds markedly 
to already available books in this field, being much more concerned with the planning 
stages than most books incorporating the word ‘‘design’’ in their title. For example, it 
treats in some detail the nature of treatment factors—qualitative and quantitative, and 
the choice of their number and level. Techniques for effective design—randomization, 
grouping of experimental units into homogeneous blocks, use of covariates, etc.—are pre- 
sented in their intuitive reasonableness to the experimenter, for whom the book is written. 
Justification for these techniques is not made to depend upon the analysis of variance. 
Continual reference, however, is made to standard sources for the analyses; Cox is care- 
ful to point out that the benefits of efficient design are only realized if correspondingly 
efficient analyses are used. On the other hand, no amount of analysis can completely com- 
pensate for lack of good design; this is the reason this book is important. 

Planning of Experiments aims to make the experimental scientist aware of the possi- 
bilities in experimental design, and for simpler designs, to enable him to construct them. 
Very little quantitative background is presumed; concepts (but not rigorous definitions) 
of standard error, significance, confidence interval, and power are developed in a persuasive 
fashion. Although basically a reference, this book could be used as a supplementary text 
for a course in experimental design or research methodology (it contains no exercises, 
however). 

The style of the book is lucid and well fitted to independent study. Each chapter 
starts with an introduction and ends with a concise summary. Recommendations are made 
as to certain sections to be omitted, depending upon the reader’s interest. Self-help sug- 
gestions are made, e.g., in discussing response curves, the reader not familiar with the 
properties of second-degree equations is invited to sketch a few. In presenting numerical 
examples of the presence and absence of interaction, Cox first gives instances where residual 
error is zero, a useful device making the interaction and main effects more apparent. The 
concept of residual error is articulated by the notion, carried out in numerical examples, 
of successively substracting out the various estimated effects from the original observations. 
It would be well, however, at some point to introduce the general model in the following 
form (using Cox’s notation and style): 


mean ati 
observation = O*erall observation _ over-all observation 
mean + | forthat - mean | + | forthat — mean | + res. 
treatment block 


As a text or reference, the lack of exercises as such is offset by the abundance of good ex- 
amples in considerable detail, well integrated with the exposition. Examples cover a wide 
range of fields, and nearly all represent actual experiments. Several examples are from 
psychology, principally experimental and educational. 

In terms of content, the book may be divided into two parts: Chapters 1 to 9, giving 








118 PSYCHOMETRIKA 


in fairly complete detail certain basic designs, and Chapters 10-14, introducing and out- 
lining the standard cases of the more complex designs. Chapter 1 gives a general introduc- 
tion, indicating when experimental designs are useful, in contrast to survey techniques, 
and enumerating five requirements for good design—unbiasedness, precision, ability to 
generalize, simplicity, and measurable error. Next, the concept of reduction of error is 
introduced, first by allocation of treatments to experimental units (randomized blocks, 
Latin squares) and second by use of concomitant variables. Adjustment of treatment 
means for concomitant variation is given considerable attention and illustrated graphically. 

Randomization receives an unusually extensive treatment, including its methods, 
properties, and justification. (A minor flaw: one of the methods suggested for entering a 
random number table is to think of some numbers to designate page, row, and column; 
such a nonrandom start might, of course, lead to bias or dependence in continued applica- 
tions.) Randomized assignment of treatments to experimental units is compared with 
systematic or subjective assignment. Occurrence of extreme permutations as a result 
of randomization is considered, and recommendations are made for preventing or dealing 
with this case. 

Factorial designs are considered at some length. Careful distinction is made between 
quantitative, ranked qualitative, and qualitative factors; the latter are further designated 
as specific (Model I) or sampled (Model II). (It is asserted that quantitative factors are 
nearly always Model II.) Response curves and response surfaces are developed and illus- 
trated by numerical examples, In the chapter on the choice of the number of observations, 
five bases are presented for estimation of precision (or equivalently, error); observed 
variation between experimental units, higher order interactions, theoretical considerations, 
within-unit variance, and past experience. 

Chapters 10 to 14 cover a wide range of additional techniques of experimental design, 
among them Graeco-Latin and higher order squares, balanced incomplete blocks, Youden 
squares, lattice squares, partially balanced incomplete blocks, fractional replication, con- 
founding, cross-over designs. For use of these designs, Cox recommends consultation with 
a statistician or reference to more advanced works. These chapters, however, serve well 
in giving the experimenter a concept of the possibilities. In the final chapter, Cox considers 
briefly a collection of techniques: search for optimum conditions (experimental conditions 
under which a particular quantity is maximized); assays, especially bioassays; trend-free 
systematic designs (small experiments where precision under an assumed trend is achieved 
at the expense of randomization, and hence, estimation of error); and the case in which 
certain treatment arrangements are inadmissible. 

The book contains extensive author and subject indices, tables of random digits 
and random permutations, references for each chapter, and a useful general bibliography, 
briefly annotating eleven standard references on the design and analysis of experiments. 

In summary, the coverage, style, and level of this book recommend it highly for in- 
dependent study in the planning of experiments. 

Maxwell’s Experimental Design in Psychology and the Medical Sciences is quite a 
different book from Cox’s. As early as the preface, one is put on guard against this slim 
volume; there the curious statement is made that this book “differs from the general run 
of statistical textbooks for psychologists and medical men, which tend to concentrate on 
descriptive statistics’ [italics mine]. In psychology, at least, this remark seems more his- 
torical than contemporary, considering such widely used psychological statistics texts as 
Walker and Lev, McNemar, or the experimental design texts of Lindquist and Edwards. 

This book, directed to psychologists and medical researchers, assumes no prior 
statistics; a sketchy introduction is given in the first chapter. Following that, separate, 
brief chapters are accorded randomized blocks, Latin and Graeco-Latin squares, factorial 
designs, cross-over designs, balanced incomplete blocks, linear regression and product 
moment correlation (five pages), analysis of covariance, inadmissible treatment arrange- 

















BOOK REVIEWS 119 


ments, systematic designs, and relative efficiency. Coverage of Cox’s development of 
systematic designs and inadmissible treatments is comparatively excessive for a book 
which does not even mention the more common topics of fractional replication, split plot, 
partially balanced incomplete blocks, lattice designs, and merely makes reference to con- 
founding. (Cox himself allots the two topics only seven pages in his much more complete 
»ok.) No consideration is given to such basic concepts as fixed and random effects, ex- 

pected mean squares, or power. The underlying model for an analysis is seldom explicity 
stated. 

There are instances of passages which are at best misleading. For example, on page 
50 it is stated that ‘‘When randomization, too, has been properly carried out in an experi- 
ment, analysis of variance procedures and the use of the variance ratio test are valid 
whether or not the variate being sampled is normally distributed in the population... .”’ 

Nor is the book redeemed by the mode of presentation. The first chapter institutes 
a precedent ior jumping into the middle of things by treating a test for the difference in 
means of two samples, but never the one sample case. References are scanty. Many standard 
works are omitted, and several of the books are cited in outdated editions. 

In general, the book presents somewhat the impression of a personal handbook or 
set of notes, competently prepared for the use of the compiler, but of little use to others 
in that form. 


University of Chicago Jack SAWYER 


AnnE Anastasli. Differential Psychology. (3rd edition) New York: Macmillan, 1958. Pp. 
xii + 664. 


This is certainly the outstanding text in differential psychology in both its thorough- 
ness of coverage and the level of sophistication of its treatment of research findings. Some 
idea of its comprehensiveness may be gained from the distribution of the numbers of refer- 
ences at the ends of the chapters—ranging from 27 to 150, with the median between 93 
and 96. 

Despite the increase in volume of literature cited, this edition is smaller in size than 
the second edition. There are 18 chapters instead of 24 and 631 pages of text instead of 867. 
Some of the economy has been effected by omitting or shortening discussions of statistical 
and measurement techniques, and by reducing the detail in the discussions of the researches 
presented. These changes may be regarded as good or as bad, depending upon one’s view 
of what the proper function of a textbook is. It has long been my opinion that on the college 
level a textbook for a course of the “survey” type should serve primarily as a source and 
reference book for the student, with free rein given to the author to state his own inter- 
pretations of the material presented. Anastasi’s book meets these criteria well. The findings 
of investigations on a given topic are well integrated, and critical evaluations of particular 
studies are in many cases made with enough generality that students with interests in the 
details of experimental design and data analysis might be stimulated to read some of the 
references in the original. 

I think I am not alone in detecting in Anastasi a tendency to be lenient in criticising 
researches whose conclusions place heavy weight on environment as a determiner of human 
abilities. The most flagrant example probably is her treatment of Bernardine Schmidt’s 
famous study. Although she is a bit more severe in this edition than in the previous one, 
Anastasi’s summary statement in this connection is, ‘Whether the specific procedures 
utilized by Schmidt can be expected to produce gains as large as those claimed, however, 
must remain an unanswered question for the present. In any event, it appears evident that 
the effectiveness of many of the techniques would be restricted to certain types of cases”’ 








120 PSYCHOMETRIKA 





(p. 405). It should hardly be necessary to justify a categorical No to the ‘unanswered 
question.”’ 

All in all, however, Anastasi’s position on the heredity-environment question, as she 
herself formulates it, is a reasonable and defensible one, and certainly the most practical. 
For example, ‘“The question should be reformulated in terms, not of how much, but of how. 
What we need to know is the modus operandi—the way in which specific hereditary and 
environmental factors operate in producing specific differences in behavior’ (p. 83). Her 
most extreme statement, perhaps, is in the final chapter: “It is not the race, or sex, or phy- 
sical ‘type’ to which the individual belongs by heredity that determines his psychological 
make-up, but the cultural group in which he was reared, the traditions, attitudes, and 
points of view impressed upon him, and the type of abilities fostered and encouraged”’ 
(p. 604). One must admit that this statement has not yet been proved false, and the point 
of view Anastasi assumes is one that college students should be able to evaluate, and have 
fun in the process. 

In decided contrast to the general maturity expected of students who use the book is 
the plane of some of the discussions of elementary statistical measures. There seems to be 
scant justification for including in a text of this sort a description and explanation of 
frequency distributions, frequency polygons, histograms, and the like (pp. 24 ff.). These 
materials add unnecessary bulk to the book, and it is doubtful that students who need 
this kind of rudimentary work can profit from the rest of the book anyway. Why not assume 
adequate command of statistics, or leave it to the instructor to supply it? 

The attempt to present, and yet condense, statistical methods is likely to lead to 
oversimplification, errors, and misleading statements. Anastasi’s fairly elaborate discussion 
of the basis for statistical regression (p. 204) is inadequate because it fails to mention that 
the direction of the regression depends upon the shape of the true-score distribution. In a 
U-shaped distribution, for example, regression would be away from the mean. 

In the section on factor analysis the following statements appear (pp. 330, 331): “If 
by factor analysis we find that five factors are sufficient to account for all the common 
variance covered by these twenty tests, we can substitute these five new dimensions for 
the original twenty in describing each individual. . . . In any event, the number of necessary 
scores required to cover the behavior domain surveyed by the original test battery would 
be reduced from twenty to five in the process.’’ Although factors are limited to accounting 
for common variance in the first part of the first sentence, the rest of that sentence and the 
second sentence seem to imply the inclusion of unique variance also. 

The objections I have just enumerated are relatively minor and easily remedied. 
When the book deals with the main topics of differential psychology it is excellent. Students 
and teachers alike should find courses in which it is used stimulating and provocative. 


University of Michigan Joun E. MILHOLLAND 


R. Oxravi VuraMAxt. Personality Traits Between Puberty and Adolescence. Annales Aca- 
demiae Scientiarum Fennicae, Ser. B, Tom. 104, Helsinki: Suomalainen Tiedeaka- 
temia, 1956. Pp. 183. 


This is a report on the relationships between scholastic achievement and “certain 
ability, temperament and dynamical traits.’’ A battery of ability tests, the Wartegg drawing 
completion test, and the Zulliger shortened variation of the Rorschach test were given to a 
sample of high school boys and girls, some of whom took the projective tests again at college 
three years later. Almost everything that could have been done with the data has been 
done. Reported for the sexes separately are multiple correlations with high school and 
college achievement, combined and separate factor analyses, and differences between the 














BOOK REVIEWS 121 


means and correlations between the two testings with the projective devices. There is also 
a Q-technique analysis for those with the best matriculation results. 

The ability tests are too restricted in content and too inadequately described for the 
factorial results to be of any interest outside Finland. Projective testers may be able to 
understand the interpretations given the factors obtained, but even they may be disturbed 
to find that over three years boys change to become both more introversive and more 
extraversive! To those who don’t speak the projective language or who lack the necessary 
intuition, the results will have little appeal. Some developmental psychologists may find 
merit in the evidence concerning personality changes between ages 15 and 18, provided 
they can interpret and justify the changes and disregard that out of enough differences 
considered some must turn out to be statistically significant. Consistently overlooked are 
the need for cross validation in such a fact-finding study, and the role of experimental 
dependence between variables in the use of factor analysis. Interpretation of factorial dif- 
ferences between boys and girls and between ages 15 and 18 in terms of differentiation is 
questionable, especially considering that methods of communality estimation and of deciding 
the number of factors are not given. 

Too much of the report consists of mention of material for its own sake rather than 
for its relevance. There are chapters on the nature of personality, on the constancy of per- 
sonality traits, on measurement in psychology, on factors in the Rorschach, and on the 
prediction of scholastic success from ability and personality tests, but their content is never 
related to what follows. For example, Viitamdki insists that factor analysis is a hypothesis- 
testing technique, yet uses it in a fact-finding way. 

Perhaps of most interest is the application of a method of rotation suggested by 
Ahmavaara, but not elsewhere reported, though it is implied in Ahmavaara’s writings. 
The primary factor axes are located through those tests which have the lowest ratios of 
their first centroid loadings to the lengths of their vectors, that is, along those vectors 
furthest from the first centroid. Rotation is made directly to oblique primary factor pattern. 
In the analysis reported, this method of rotation yields a fair simple structure. But it may 
overemphasize single variables in the location of hyperplanes and might better be used for 
selection of the first trial vectors in the single plane method of rotation. 

The translation is quite fair, though there are some strange terms (e.g., prestations), 
and some oddities, such as: ‘(Marking done by the teachers is also incoherent!’’ 

Allin all, there has been a lot of busy-work and much being wise after the event. One 
gets lost in the wealth of data, and this is aggravated by the highly intuitive interpretations. 
The report is below the standard of others in the Series and will interest a limited audience 
only. 


University of Sydney J. A. RADCLIFFE 


D. H. Srorr. The Social Adjustment of Children. Manual to the Bristol Social Adjustment 
Guides. London: University of London Press, 1958. 


The Bristol Social Adjustment Guides, developed by D. H. Stott and E. G. Sykes, 
“offer a method for detecting and diagnosing maladjustment, unsettledness, or other 
emotional handicap in children of school age.’’ There are separate forms for The Child 
in School (Boy), The Child in School (Girl), The Child in Residential Care, and The Child 
in the Family. Each form consists of groups of descriptive phrases in various categories. 
The observer’s task is simply to underline the phrases which are appropriate. For example, 
the forms for The Child in School include (i) Attitude to Teacher, (ii) Attitude to School 
Work, Games, and Play, (iii) Attitude to Other Children, (iv) Personal Ways and Physique. 
Each of these forms has groups of phrases such as 


Greeting teacher: Over-eager to greet / greets normally / sometimes eager / etc. 











122 PSYCHOMETRIKA 


Categories and content are pertinent and coverage is good; phraseology is clear. Scoring 
is by template, awkward for any large number of cases; the summary forms, although 
effectively designed, seem unnecessarily large. 

The authors consider children’s problems in terms of a number of diagnostic cate- 
gories, each falling along a continuum from mild “‘unsettledness’’ to severe ‘‘maladjustment,”’ 
except XC which does not extend into the maladjustment end. These categories are: 
U-W, unforthcomingness-withdrawal; D, depression; XA, anxiety or uncertainty about 
adult interest and affection; HA, hostility to adults; K (for knavery!), an attitude of 
unconcern for adult approval; XC, anxiety for approval of and acceptance by other children; 
HC, hostility to other children; Restlessness (at the maladjustment end this becomes 
psychosomatic disorder or physical defect); and a miscellaneous group. 

The author’s basic concept is that maladjustment results from over-readiness of 
the “executive reactions” (i.e., normal responses to unfavorable situations) as a result 
of excessive or prolonged activation. His discussion of the patterns which tend to occur 
is insightful, but in places this discussion is somewhat more dogmatic about causal se- 
quences than the present state of our knowledge would seem to warrant. 

The long chapter on the development of the Guides is introduced by a general discus- 
sion of epistemological and methodological problems, as background and justification for 
the largely nonstatistical technique followed in the construction. The extremely laborious 
and somewhat naive procedure is open to technical criticism, but it must be noted that 
the criteria for each step were in terms of psychological sense—a standard sometimes 
lacking in studies with more statistical sophistication. Unfortunately, in this description 
some essential data are omitted, e. g., adequate description of the subjects on whom the 
Guides were tried out. It would also appear that there was contamination of the criterion 
classes by the test scores. There are no data on reliability and no norms. 

From the general nature of the scales, their content, and their coverage, they might 
well meet their author’s intent to provide “a clinical instrument by which a comprehensive 
report of how the child behaves and reacts in real life can be furnished to the psychologist 
or psychiatrist, and a system for the interpretation of the behaviour . . . a means of judging 
whether a child is suffering from emotional difficulties, such as might be the cause of failure 
in school-work, or which might act as a warning sign of the possibility of delinquent break- 
down ... in the training of teachers as a framework for the observation and study of 
children. . .”’ From the Manual, however, there is no way of judging whether or not they 
are likely to be effective. 


Harvard University ANNE RoE 


ALPHONSE CHapPaNis. Research Techniques in Human Engineering. Baltimore, Md.: The 
Johns Hopkins Press, 1959. Pp. ix + 316. 


In 1956 Chapanis published a monograph entitled The Design and Conduct of Human 
Engineering Studies (San Diego, Calif.: San Diego State College Foundation, pp. iii + 73). 
The present volume is an expanded version of the monograph, with the addition of one new 
chapter. As before, the author continues reluctantly to use the phrase human engineering 
rather than some more desirable but less popular term such as engineering psychology, bio- 
mechanics, human factors engineering, or ergonomics. The aims of the author are (1) a 
description of the methods available to the human engineer, and (2) a presentation of 
“principles and guide lines about ways of doing dependable studies on people.” Although 
elementary in nature, the book is intended to provide background information to individuals 
who are concerned with experiments relating men to machines. Thus the audience should 
include those engaged in industrial engineering, operations research, experimental psychol- 
ogy, systems engineering, and scientific management. 














BOOK REVIEWS 123 


The introductory chapter proposes that the tactics and strategy of science can be 
learned from a book—thus justifying the present work—and states that research on people 
is the most difficult kind of experimentation. After dismissing common sense as a reliable 
standard for design decisions, some common pitfalls in studying people are reviewed. 

The remaining seven chapters present the methods available to the practicing human 
engineer who might be called upon to perform research in order to solve some man-machine 
problem. Under methods of direct observation we find operator opinions, activity sampling, 
process analysis (including link analysis), and micromotion techniques. A chapter on acci- 
dents and near accidents describes the critical-incident technique. Raw data are used to 
describe and illustrate statistical methods which include (in 51 pages!) tabular and graphic 
distributions, measures of central tendency, measures of variability, measures of relation- 
ship (Pearson r only), and significance of differences (¢ test and F test). The chapter on 
experimental methods also uses raw data to illustrate single- and multi-variable designs 
(including the Latin square); ten page. are devoted to how much realism is needed in 
experiments. The psychophysical methods discussed are average error, limits, and constants. 
The eighth (and new) chapter is on techniques of articulation testing for speech communi- 
cation applications. Chapanis wisely includes an earlier chapter on special problems in 
experimenting with people in which he discusses experimental variables, control of motiva- 
tion, selection of subjects, and apparatus. 

The nonpsychologist reader of this book should gain the level of sophistication about 
experimenting with human beings that the psychology student receives in his basic course 
in experimental psychology (both types of students should, of course, have additional lab- 
oratory exercises). In fact, as a methods book, this volume could well serve as a text for 
an experimental psychology course if suitably supplemented by contextual lectures and 
outside reading. The real contribution made by this volume, however, is its successfully 
clear and timely account of research techniques to a growing audience of physical and 
engineering scientists who have become concerned with the human factor in man-machine 
efficiency. 


Tufts University Lreonarp C. Meap 


F. F. Srepuan and P. J. McCartuy. Sampling Opinions. New York: John Wiley and Sons 
Inc., 1958. Pp. 451. 


This book presents one of the few sound and comprehensive treatments of sample sur- 
vey methodology. There are available volumes which give a more complete treatment of 
particular phases of survey methodology, e.g., Hansen, Hurwitz, and Madow’s treatment of 
sampling or Hyman’s books on interviewing in social research, but the book under review 
is one of the few which attempts to cover the entire area of survey research and succeeds 
in doing so without degenerating into a collection of pious admonitions. In addition, the 
authors present a good deal of hitherto unpublished data which will be of considerable value 
in actual practical sample design. 

The great strength of this book is its emphasis on the multiplicity of problems encoun- 
tered in sample surveys and the need for considering all of these problems in survey design. 
The weakness of the book is its overemphasis of quota sampling and particularly of the 
type of quota sampling which was popular seven or eight years ago. With the exception of 
this overemphasis, the volume is a very careful and scholarly treatment of the field of sample 
surveys. It is well suited as a text for a general introductory course in the techniques of 
sample surveys and, with supplemental readings (e.g., from Hurwitz, Hansen, and Madow 
or from Hyman) could also be used in more advanced courses since it does bring in a good 
deal of original research material. 











124 PSYCHOMETRIKA 


There are a few statements with which the reviewer would take issue but these are 
extremely minor. For example, in referring to the ratio of two random variables, the state- 
ment is made that ‘‘only an approximation can be obtained from the probability model for 
its variance’’ (p. 202). While the formulas usually given for the variance of such ratios are 
approximations an exact formula is, of course, available. Since a ratio of two random variates 
is itself 2 random variate, it has a variance which can be defined and which can be estimated 
by repeated samplings, the exactness of the variance estimate being limited only by the 
number of repeated samplings it is feasible to make. 

Chapter 10, on the variability of quota sampling, glosses over the main point that 
the mean square error of a quota sample estimate is of primary interest. However, even if 
one takes this treatment in its own terms and considers only the variance, the treatment is 
somewhat misleading. There is a comparison of the variance of quota sampling with the 
variance of random binomial sampling which concludes that the variance of a quota sample 
result “can be approximated in a rough fashion by the variance of the binomial model for 
random sampling when it is multiplied by a suitable factor by the order of 1.5” (p. 233). 
Actually the data are insufficient to support any definitive conclusions—even as cautious 
a conclusion as the one just quoted. In addition, it would be more appropriate to compare 
the variance of quota sample estimates with the variance of estimates based on a clustered 
probability sample rather than with the variance of simple random sampling. I strongly 
suspect that a comparison between quota sampling and clustered random sampling (with 
the same size and geographic distribution of interviewers’ assignments) would show that 
quota sampling gives a lower rather than a higher variance. 

A particularly valuable feature of the book is Chapter 12 on analysis of field operations. 
While the analysis was carried out on a quota sample, a good deal of it is applicable to 
surveys which use probability sampling. This chapter is especially valuable in emphasizing 
some of the factors leading to high and low costs and to high and low biases and sampling 
variances. 

The publication of this book emphasizes again the great methodological improvements 
in survey design which have occurred in the past 20 years—improvements to which the 
authors of the book have made substantial contributions. As Hyman notes in his book on 
interviewing, an awareness of defects in methodology is a sign of sophistication and of prog- 
ress rather than of weakness and, in this respect, the survey field is well ahead of other 
areas in the social sciences and of many areas in the natural sciences as well. 


National Analysts Inc. E11 8S. Marks 


8S. N. Roy. Some Aspects of Multivariate Analysis. New York: John Wiley and Sons, Inc., 
1957. Pp. viii +214. 


This monograph is the first of a series to be published by the India Statistical Institute. 
The main body of the book is a collection of journal papers by the author, his students, and 
colleagues, slightly rewritten for uniformity of style and presentation. Several chapters, one 
on the general theory of tests of hypotheses and two on properties of the multivariate 
normal distribution and related sampling statistics, have been added providing background 
for the notation used throughout the book. The final 77 pages are taken up by nine appen- 
dices containing proofs of various theorems needed in the body of the work. 

Professor Roy sets for himself the task of obtaining confidence bounds for certain 
functions of the parameters of one or several multivariate normal distributions, where the 
functions are chosen to be natural measures of deviation from the usual null hypotheses. 
The approach is novel. He first defines a class of statistical tests which have “good” prop- 






















BOOK REVIEWS 125 


erties. One property is that the tests can easily be inverted to obtain confidence bounds. 
Another is that the method of test construction leads naturally into a set of simultaneous 
confidence bounds; that is, in 95 percent (say) of such experiments each confidence bound 
in the set will contain the true value of its corresponding parametric function. Then he 
discusses the power of the tests (and thus the “shortness’’ of the confidence bounds) and 
obtains lower bounds for the power functions, and finally, develops the confidence bounds 
associated with the class of tests. These include confidence bounds on means and linear 
functions of means, on the characteristic roots of variance-covariance matrices, and on 
regression functions. 

In the last chapter Roy discusses the application of the same class of tests to multi- 
variate categorical data. Here he makes the important but often neglected distinction 
between a classification whose marginal totals are fixed in advance and one whose marginal 
totals are random variables. The distinction does not affect the test criterion but rather 
determines the class of alternative hypotheses to be considered. It is also useful in pointing 
up analogies between contingency table problems and analysis of variance problems. 

The proofreading of the book is less than adequate, particularly considering the small 
amount of redundancy in a mathematical equation. The reader who, like the reviewer, is 
annoyed to find that he is reading a continued story will hope that the wait for the “later 
monograph” which is promised so often throughout the book will not be too long. 

It is not likely that the ultimate consumer of statistical methods will find this book 
worth his while. But the psychological statistician interested in multivariate problems will 
profit from a careful study of this work. 

. J. E. Kerra Smrru 
Lincoln Laboratory, 
Massachusetts Institute of Technology 


CORRECTION 


An erratum which appeared in Psychometrika, Volume 24, p. 404, 
December, 1959, unfortunately included a typographical error. The final 
symbol should read g, , not p. . Thus the erratum would read as follows. 

In Cureton, Edward E., Note on $/¢max . Psychometrika, 1959, 24, 
89-92, the first sentence of paragraph 2, page 89, should read “It is well 
known that ¢ can equal +1 only if p, = p, , and that it can equal —1 only 
if p: = @ ((1], p. 324; [2], p. 342).” 

The editorial staff joins the William Byrd Press in promising more 
diligently to “mind our p’s and q’s.”’ 














